Open rchurch4 opened 1 year ago
Hi @rchurch4. we will take a look. Thanks for the issue
Hi @rchurch4, did you use an requirements.txt in source_dir
to install third-party packages? If yes, can you share that file as well? We tried executing import sagemaker_tensorflow
in the container and it didn't throw error. Some thrid-party pypi package installation could disrupt the shared library though.
We are still trying to replicate this issue. And if you wish to debug locally, you can docker pull
the image and use local_gpu
as instance type and LocallSession
for your sagemaker job.
@rchurch4 Can you provide the above information if the issue still persist?
Checklist
Concise Description: Importing
sagemaker_tensorflow
results in an undefined symbol error:We have tried using sagemaker_tensorflow versions 2.10.0.1.16.0 and 2.11.0.1.17.0 for their respective images (below), and both result in the same error. Our guess, from looking at how the image is built, is that the problem occurs due to the way that the sagemaker_tensorflow_extensions repository is cloned and then installed as opposed to pip installed or something of the like. The
libPipeModeOp.so
does not seem to exist in the git repository, so it's possible that this file should be generated on install and that this doesn't happen when installed in this way. This is further reinforced by the CMAKE file that references the libPipeMode file. In thesetup.py
file, the CMAKE extension is called to build the C++ files, but it seems that it is referencing pipemode_op. In the CMAKE list, the name is pipemodeop. This may be the root of the problem.This results in us not being able to use the
PipeModeDataSet
class.DLC image/dockerfile: 763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-training:2.11.0-cpu-py39-ubuntu20.04-sagemaker 763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-training:2.10.0-cpu-py39-ubuntu20.04-sagemaker
Current behavior: Importing
sagemaker_tensorflow
fails for these images.Expected behavior: Importing
sagemaker_tensorflow
should not fail for these imagesAdditional context: It would be incredibly useful if I could pull the base image to run locally to test this myself and/or debug myself.