Closed diazGT94 closed 2 years ago
let us look.
thanks
I was able to run my python code by replacing in the Dockerfile by replacing CMD ["nueron-top"]
with CMD ["serve"]
. However, I found the following error when the model is loading to the chip.
I check if aws-neuron-dkms
was installed in the image using dpkg -l | grep neuron
and as it can be seen from the attached image is not installed.
I followed the steps indicated here and tried to install aws-neuron-dkms
in my image but when the command RUN apt-get install aws-neuron-dkms -y
is executed it returns the error
Building for 5.4.0-1058-aws
Building for architecture x86_64
Building initial module for 5.4.0-1058-aws
Done.
neuron:
Running module version sanity check.
Running the pre_install script:
/var/lib/dkms/aws-neuron/2.2.6.0/source/./preinstall: line 2: udevadm: command not found
Error! pre_install failed, aborting install.
You may override by specifying --force.
dpkg: error processing package aws-neuron-dkms (--configure):
installed aws-neuron-dkms package post-installation script subprocess returned error exit status 101
Setting up build-essential (12.4ubuntu1) ...
Processing triggers for libc-bin (2.27-3ubuntu1.4) ...
Errors were encountered while processing:
aws-neuron-dkms
E: Sub-process /usr/bin/dpkg returned an error code (1)
The command '/bin/sh -c apt-get install aws-neuron-dkms -y' returned a non-zero code: 100
and the image is never built. I don't now if the error of why my model doesn't load in the previous Docker image is related in the version of Neuron I used to convert it from pythorch to pytorch-neuron?
@diazGT94 - our latest release eliminated the need for neuron-rtd and simplified the container deployment experience.
In your specific case, there are two issues that stand out:
Please check out this document for more details on getting a working container with Neuron: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-deploy/tutorials/neuron-container.html
If you have any further problems with the container setup, let us know here.
@awsrjh Thanks for your help.
I used the Dockerfile provided here and modified to have the following Dockerfile:
FROM ubuntu:18.04
LABEL maintainer=" "
RUN apt-get update -y \
&& apt-get install -y --no-install-recommends \
ffmpeg \
libsm6 \
libxext6 \
gnupg2 \
wget \
python3-pip \
python3-setuptools \
&& cd /usr/local/bin \
&& pip3 --no-cache-dir install --upgrade pip \
&& rm -rf /var/lib/apt/lists/* \
&& apt-get clean
RUN echo "deb https://apt.repos.neuron.amazonaws.com bionic main" > /etc/apt/sources.list.d/neuron.list
RUN wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | apt-key add -
# Installing Neuron Tools
RUN apt-get update -y && apt-get install -y \
aws-neuron-tools
# Sets up Path for Neuron tools
ENV PATH="/opt/bin/:/opt/aws/neuron/bin:${PATH}"
# Include framework tensorflow-neuron or torch-neuron and compiler (compiler not needed for inference)
RUN pip3 install \
torch-neuron \
--extra-index-url=https://pip.repos.neuron.amazonaws.com
COPY ./package /package/
RUN pip install -r /package/requirements.txt
WORKDIR "/package"
ENTRYPOINT ["python3", "main.py"]
By doing this I succeed in building my image. However, when I tried to run my image using the commands specified on the tutorial. I still have and error when my script tries to load the model, which indicates that the Pytorch Neuron Runtime could not be initialized as you can see from the image below.
reopening
Hi
one possibility: when you removed the aws-neuron-dkms from the container config -- did you put that step into the Base OS? The driver needs to be installed on the base operating system.
First you need to remove the old driver ( this is found here: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-intro/pytorch-setup/pytorch-install.html#develop-on-aws-ml-accelerator-instance ) :
Do these steps on the base operating system - not in a container config:
Stop Neuron Runtime 1.x daemon (neuron-rtd) by running: sudo systemctl stop neuron-rtd
Uninstall neuron-rtd by running: sudo apt remove aws-neuron-runtime
Install or upgrade to latest Neuron driver (aws-neuron-dkms) by following the “Setup Guide” instructions.:
sudo apt-get update -y
sudo apt-get install linux-headers-$(uname -r) -y
sudo apt-get install aws-neuron-dkms -y
All of this is found in this guide: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-intro/pytorch-setup/pytorch-install.htm
Closing it again, doing this on the Base OS solved the issue.
I've use the template presented hereto create my Docker image and add the dependencies for my app as is specified.
# Include your APP dependencies here.
COPY ./package /package/
RUN pip install -r /package/requirements.txt
The I used the template of the entry point from here and modified the line 41 to start my application:
python main.py --key_dev True
Doing this I suceed to run the docker image, then I stop the neuron-rtd service before running my image.
The image starts to run an first it displays information from neuron-top As it can be seen there is no models loaded to the core, if I exit the neuron-app, I see that my image is stuck in
nrtd[7]: [NRTD:RunServer] Server listening on unix:/run/neuron.sock
and my application is never executed.As debug I printed the value of
"$1"
which is used in the bash script to run the application, as it can be seen the condition of the value is never True. Therfore my python script is never executed.I would like to know why the value is never set as serve and what should I do to successfully run my application in the container.