Closed cm2435 closed 2 years ago
@cm2435 Sorry for necroing the issue, but have you found out what the problem was? I've got the exact same problem, torcheia import failure with the same symbol code, though on python3.7.
Edit: trace
Traceback (most recent call last):
File "stub.py", line 13, in <module>
import global_vars
File "/home/ec2-user/global_vars.py", line 3, in <module>
from detectors.yoloxtorch import YoloxTorchDetector
File "/home/ec2-user/detectors/yoloxtorch.py", line 25, in <module>
import torcheia
File "/home/ec2-user/.local/lib/python3.7/site-packages/torcheia/__init__.py", line 1, in <module>
from _torch_eia import *
ImportError: /home/ec2-user/.local/lib/python3.7/site-packages/_torch_eia.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZTVN5torch8autograd8profiler14RecordFunctionE
@solidmetanoia No worrries :)
Honestly, I ended up not being able to fix this. My guess is that it's a package mismatch where the official pytorch Deep Learning Container by the Sagemaker team has a different version of cudnn or cuda to the EIA package version requirements.
In the end I just ended rolling my own serving/training container just using a fastapi/nginx stack with guicorn webserver for concurrency. Let me know if you want the boiler plate for it and I will share.
Yeah, nah, thank you. I'll keep trying to somehow connect things. :+1:
Describe the bug Hello All! First time raising an issue with Sagemaker so forgive me if this is wrong place or format. Trying to deploy Bert model to return sentence embeddings in pytorch container with the model served as torchscript trace .pt file. Deployment in framework version 1.5 works as intended, until an elastic accelerator is attached to the estimator deployed, at which point the torchserve logs show that the model cannot be loaded. Using framework 1.5.1 further yields an import error of torch EIA.
To reproduce Following the documentation on how to load a torchscript model in model_fn at the link below.
https://docs.aws.amazon.com/elastic-inference/latest/developerguide/ei-pytorch-using.html
I load my model using the following
Inside of a inference.py file in a .code subdirectory, as specified by the documentation.The Pytorch model class is intialized as follows:
Expected behavior This as, without elastic inferance, load a bert model trace from the traced_bert.pt file, and use it to return embeddings of an input string or list of strings. However, it instead returns the following cloudwatch logs below:
Screenshots or logsy
System information A description of your system. Please provide:
Additional context Add any other context about the problem here.
Deployment Specs