aws / sagemaker-python-sdk

A library for training and deploying machine learning models on Amazon SageMaker
https://sagemaker.readthedocs.io/
Apache License 2.0
2.1k stars 1.14k forks source link

Unable to import external libraries in Mxnet script #127

Closed algoscale1 closed 5 years ago

algoscale1 commented 6 years ago

Hi,

I have using Mxnet for deploying xgboost model on sagemaker. I have created a script in which I have all the required train and inference functions like train(), input_fn() etc. mnist_estimator = MXNet(entry_point='mnist.py', role=role, output_path=model_artifacts_location, code_location=custom_code_upload_location, train_instance_count=1, train_instance_type='ml.m4.xlarge', py_version='py3')

I am trying to use pandas library in the train function in the mxnet script, but i am getting this error

executing startup script (first run) 2018-04-05 06:59:26,096 INFO - root - running container entrypoint 2018-04-05 06:59:26,096 INFO - root - starting train task 2018-04-05 06:59:27,490 INFO - mxnet_container.train - MXNetTrainingEnvironment: {'hosts': ['algo-1'], 'user_script_name': 'mnist.py', 'enable_cloudwatch_metrics': False, 'channels': {'data': {'S3DistributionType': 'FullyReplicated', 'TrainingInputMode': 'File', 'RecordWrapperType': 'None'}}, 'available_cpus': 4, 'user_script_archive': 's3://mxnettrain/customcode/mxnet/sagemaker-mxnet-py3-cpu-2018-04-05-06-54-37-497/source/sourcedir.tar.gz', 'user_requirements_file': None, 'available_gpus': 0, '_scheduler_ip': '10.32.0.4', 'output_data_dir': '/opt/ml/output/data/', 'input_config_dir': '/opt/ml/input/config', '_scheduler_host': 'algo-1', 'resource_config': {'current_host': 'algo-1', 'hosts': ['algo-1']}, 'model_dir': '/opt/ml/model', 'input_dir': '/opt/ml/input', 'current_host': 'algo-1', 'container_log_level': 20, 'code_dir': '/opt/ml/code', 'hyperparameters': {'sagemaker_container_log_level': 20, 'sagemaker_program': 'mnist.py', 'sagemaker_region': 'us-east-1', 'sagemaker_job_name': 'sagemaker-mxnet-py3-cpu-2018-04-05-06-54-37-497', 'sagemaker_enable_cloudwatch_metrics': False, 'sagemaker_submit_directory': 's3://mxnettrain/customcode/mxnet/sagemaker-mxnet-py3-cpu-2018-04-05-06-54-37-497/source/sourcedir.tar.gz'}, 'base_dir': '/opt/ml', '_ps_port': 8000, '_ps_verbose': 0, 'channel_dirs': {'data': '/opt/ml/input/data/data'}, 'sagemaker_region': 'us-east-1', 'output_dir': '/opt/ml/output'} Downloading s3://mxnettrain/customcode/mxnet/sagemaker-mxnet-py3-cpu-2018-04-05-06-54-37-497/source/sourcedir.tar.gz to /tmp/script.tar.gz 2018-04-05 06:59:27,589 INFO - botocore.vendored.requests.packages.urllib3.connectionpool - Starting new HTTP connection (1): 169.254.170.2 2018-04-05 06:59:27,677 INFO - botocore.vendored.requests.packages.urllib3.connectionpool - Starting new HTTPS connection (1): s3.amazonaws.com 2018-04-05 06:59:27,748 INFO - mxnet_container.train - Starting distributed training task 2018-04-05 06:59:27,758 ERROR - root - uncaught exception: No module named 'pandas' Traceback (most recent call last): File "/opt/amazon/bin/entry.py", line 32, in modes[mode]() File "/opt/amazon/lib/python3.4/site-packages/container_support/training.py", line 21, in start raise e File "/opt/amazon/lib/python3.4/site-packages/container_support/training.py", line 15, in start fw.train() File "/opt/amazon/lib/python3.4/site-packages/mxnet_container/train.py", line 162, in train user_module = mxnet_env.import_user_module() File "/opt/amazon/lib/python3.4/site-packages/container_support/environment.py", line 87, in import_user_module user_module = importlib.import_module(script) File "/usr/lib/python3.5/importlib/init.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 986, in _gcd_import File "", line 969, in _find_and_load File "", line 958, in _find_and_load_unlocked File "", line 673, in _load_unlocked File "", line 665, in exec_module File "", line 222, in _call_with_frames_removed File "/opt/ml/code/mnist.py", line 16, in import pandas as pd ImportError: No module named 'pandas'

If i am not wrong mxnet is creating its own environment where all these external libraries are not present. is there any way i can use these external libraries?

yangaws commented 6 years ago

Hi @algoscale1 ,

Thanks for using sagemaker!

For mxnet training, there's an argument source_dir that can be set for additional dependencies in mxnet estimator.

README doc: source_dir Path (absolute or relative) to a directory with any other training source code dependencies aside from the entry point file. Structure within this directory will be preserved when training on SageMaker.

Source code: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/mxnet/estimator.py#L24

Thanks!

algoscale1 commented 6 years ago

Hi @yangaws

I tried to provide the source_dir path to the site-packages directory but it didn't work. Is there any example that installs additional dependencies in mxnet?

yangaws commented 6 years ago

Hi @algoscale1 ,

I am really sorry. I just checked the codes and found this external libraries import feature is only available in some of our frameworks. Unfortunately, mxnet is not one of them.

So what we recommend to do is to install the dependencies within codes:

https://stackoverflow.com/questions/12332975/installing-python-module-within-code

Sorry again for giving inaccurate answer at beginning.

yangaws commented 6 years ago

BTW we have the task to enable this feature for mxnet in our backlog. In the future, there will be a better way than this pip-in-code to import additional dependencies.

yifeim commented 6 years ago

I find my question related:

I think pip-in-code is not possible to upgrade existing modules? In particular, I want to upgrade mxnet to the latest version (pre-release). While I can install the newer versions, the import will always default to 1.1.0.

Any walk-arounds? Does it make sense to somehow include mxnet in source_dir?

yifeim commented 6 years ago

Solved my own problem reading through this line:

# For building images of MXNet versions 1.1 and above
docker build -t preprod-mxnet:1.1.0-cpu-py2 --build-arg py_version=2
--build-arg framework_installable=mxnet-1.1.0-py2.py3-none-manylinux1_x86_64.whl -f Dockerfile.cpu .
laurenyu commented 5 years ago

Closing due to inactivity. Feel free to reopen if necessary.

samlovestech commented 5 years ago

@yangaws Hi yang, can you be more specific? if i use sklearn estimator, there's no requirements or env parameters for me to specify external package names... Detailed problem: sklearn_estimator = SKLearn(entry_point='text.py', *args) ... in the text.py, i need to import external package hasn't been installed for example "NLTK" ?

yangaws commented 5 years ago

Hi @Seninus , If you want to use NLTK in your text.py script in SageMaker. You can install NLTK yourself in the script.

For how to do that you can refer to this: https://stackoverflow.com/questions/12332975/installing-python-module-within-code

But I have not followed updates in this repo for some time. Hence I am not sure if what I said is still recommended by SageMaker. I suggest you reopen this issue to confirm.

laurenyu commented 5 years ago

@Seninus You can include a requirements.txt file in your source directory. For more, see https://sagemaker.readthedocs.io/en/stable/using_sklearn.html#using-third-party-libraries

samlovestech commented 5 years ago

Thanks @laurenyu , i did figure out including th requirement.txt in source_dir... but it failed to pip install with SSL certificate error. the training instance is in my VPC, my company net might block this external pip install, i am still tryin to figure it out... let me know if my direction is wrong..