aws-neuron / aws-neuron-driver

Linux kernel device driver supporting AWS Neuron SDK
GNU General Public License v2.0
8 stars 4 forks source link

/dev/neuron0 is missing on AWS ECS Optimised AMI for Inferentia in Tokyo #3

Closed neilferreira closed 3 years ago

neilferreira commented 3 years ago

Fingers crossed that this is the right place for this information

As per https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-optimized_AMI.html I am using the AWS ECS optimised AMI for Inferentia (https://ap-northeast-1.console.aws.amazon.com/systems-manager/parameters/aws/service/ecs/optimized-ami/amazon-linux-2/inf/recommended/image_id/description?region=ap-northeast-1)

As of writing, this is ami-08e781fa005b6a4cf

When the server starts, the /dev/neuron0 device is missing.

ll /dev/neuron*
ls: cannot access /dev/neuron*: No such file or directory
yum info aws-neuron-dkms
Loaded plugins: dkms-build-requires, priorities, update-motd, upgrade-helper
Installed Packages
Name        : aws-neuron-dkms
Arch        : noarch
Version     : 2.2.6.0
Release     : dkms
Size        : 393 k
Repo        : installed
Summary     : aws-neuron 2.2.6.0 dkms package
License     : Unknown
Description : Kernel modules for aws-neuron 2.2.6.0 in a DKMS wrapper.

This is fixed by re-isntallign aws-neuron-dkms with the same version

[root@ip-10-0-2-113 dev]# yum install aws-neuron-dkms
Loaded plugins: dkms-build-requires, priorities, update-motd, upgrade-helper
amzn2-core                                                                                                                                                       | 3.7 kB  00:00:00
Package aws-neuron-dkms-2.2.6.0-dkms.noarch already installed and latest version
Nothing to do
[root@ip-10-0-2-113 dev]# ls^C
[root@ip-10-0-2-113 dev]# yum reinstall aws-neuron-dkms
Loaded plugins: dkms-build-requires, priorities, update-motd, upgrade-helper
Resolving Dependencies
--> Running transaction check
---> Package aws-neuron-dkms.noarch 0:2.2.6.0-dkms will be reinstalled
--> Finished Dependency Resolution

Dependencies Resolved

========================================================================================================================================================================================
 Package                                          Arch                                    Version                                         Repository                               Size
========================================================================================================================================================================================
Reinstalling:
 aws-neuron-dkms                                  noarch                                  2.2.6.0-dkms                                    neuron                                   96 k

Transaction Summary
========================================================================================================================================================================================
Reinstall  1 Package

Total download size: 96 k
Installed size: 393 k
Is this ok [y/d/N]: y
Downloading packages:
aws-neuron-dkms-2.2.6.0.noarch.rpm                                                                                                                               |  96 kB  00:00:00
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : aws-neuron-dkms-2.2.6.0-dkms.noarch                                                                                                                                  1/1
Removing old aws-neuron-2.2.6.0 DKMS files...

-------- Uninstall Beginning --------
Module:  aws-neuron
Version: 2.2.6.0
Kernel:  4.14.248-189.473.amzn2.x86_64 (x86_64)
-------------------------------------

Status: This module version was INACTIVE for this kernel.

Running the post_remove script:
rmmod: ERROR: Module neuron is not currently loaded
depmod...

DKMS: uninstall completed.

------------------------------
Deleting module version: 2.2.6.0
completely from the DKMS tree.
------------------------------
Done.
Loading new aws-neuron-2.2.6.0 DKMS files...
Building for 4.14.248-189.473.amzn2.x86_64
Building initial module for 4.14.248-189.473.amzn2.x86_64
Done.

neuron.ko:
Running module version sanity check.

Running the pre_install script:
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/4.14.248-189.473.amzn2.x86_64/kernel/drivers/neuron//

Running the post_install script:
neuron

depmod...

DKMS: install completed.
  Verifying  : aws-neuron-dkms-2.2.6.0-dkms.noarch                                                                                                                                  1/1

Installed:
  aws-neuron-dkms.noarch 0:2.2.6.0-dkms

Complete!
[root@ip-10-0-2-113 dev]# ll /dev/neuron*
crw-rw-rw- 1 root root 248, 0 Nov 12 06:04 /dev/neuron0
[root@ip-10-0-2-113 dev]#
micwade-aws commented 3 years ago

Sincere apologies @neilferreira. One of the recent releases of the ECS AMI shipped with a broken config due to a test defect. The latest AMI (released today) does not have this defect and will load the driver on boot.

Newest ECS AMI id for ap-northest-1: ami-07937c3a993978715