NVIDIA / FasterTransformer

Transformer related optimization, including BERT, GPT
Apache License 2.0
5.66k stars 877 forks source link

libth_transformer.so: cannot open shared object file: No such file or directory #638

Open ma-siddiqui opened 1 year ago

ma-siddiqui commented 1 year ago

While running the below command, i am facing errors. please advise.

python3 ./FasterTransformer/examples/pytorch/t5/summarization.py --ft_model_location t5-v1_1-base/c-models/ --hf_model_location t5-v1_1-base/ --test_ft --test_hf

[INFO] load HF model spend 4.947393 sec [INFO] MPI is not available in this PyTorch build. [INFO] MPI is not available in this PyTorch build. [INFO] load FT encoder model spend 0.317683 sec [INFO] load FT decoding model spend 0.43306 sec [INFO] MPI is not available in this PyTorch build. Traceback (most recent call last): File "./FasterTransformer/examples/pytorch/t5/summarization.py", line 404, in main() File "./FasterTransformer/examples/pytorch/t5/summarization.py", line 200, in main ft_encoder = FTT5Encoder(ft_encoder_weight.w, args.lib_path, encoder_config.num_heads, File "~FasterTransformer/examples/pytorch/t5/../../../examples/pytorch/t5/utils/ft_encoder.py", line 380, in init torch.classes.load_library(lib_path) File "~anaconda3/envs/nlp_dev/lib/python3.8/site-packages/torch/_classes.py", line 51, in load_library torch.ops.load_library(path) File "~anaconda3/envs/nlp_dev/lib/python3.8/site-packages/torch/_ops.py", line 573, in load_library ctypes.CDLL(path) File "~anaconda3/envs/nlp_dev/lib/python3.8/ctypes/init.py", line 373, in init self._handle = _dlopen(self._name, mode) OSError: ~smart_nation/nlp/lib/libth_transformer.so: cannot open shared object file: No such file or directory

Wanan-ni commented 1 year ago

Hi, I solved this problem. Firstly, you should check if libth_transformer.so is in ~smart_nation/nlp/lib/libth_transformer.so

image

If not , you should find it first, you can use command "find -name libth_transformer.so" in root dir of FasterTransformer

If you can't find this file, you might delete the build dir, then re-run cmake ... make...( actually I didn't find it and I rerun the building commands )

Then you can check your building log if there is libth_transformer.so

image

Finally, I find this file in FasterTransformer/build/lib/libth_transformer.so, then you can change the lib_path in you file and rerun your python command.

Hope this can be helpful.

ma-siddiqui commented 1 year ago

Thank you for the reply. I am able to rebuild it but unfortnately, there is no libth_transformer.so.....is created with an other name?

ma-siddiqui commented 1 year ago

is it possible to share your libth_transformer.so?

Wanan-ni commented 1 year ago

Thank you for the reply. I am able to rebuild it but unfortnately, there is no libth_transformer.so.....is created with an other name?

After you executing "make ...", you shoud check if there is an indication showing that your compilation was successful. Is there a notification circled in red image

Wanan-ni commented 1 year ago

is it possible to share your libth_transformer.so?

I am willing to share, but I am afraid it can't work, since its compliation depending on the env and we might have different env. my env is pytorch1.10, cuda11.1

ma-siddiqui commented 1 year ago

Thank you for your kind reply. No issues. I will use it with pytorch 1.10 and cuda 11.1

Additionally, if possible please share the same code of that release. I will try to build at my own. Much appreciated your help and support.

taehyunzzz commented 1 year ago

Can the lib_path be set as a relative path, or an absolute path? I've tried both, but couldn't make it to work. The file libth_transformer.so is there alright, but can't make the machine ID it...

sfyumi commented 1 year ago

May be your build command is wrong. Can you share your build command?

taehyunzzz commented 1 year ago

There was an issue while remaking the library... I've seen CUDA compatibility issues with FT, but I can't seem to find the post that I've seen. I was using CUDA11.8, and I think that was the issue. Environment setups are so frustrating :(

shannonphu commented 1 year ago

I'm running into this issue too. What are the build and make commands you are running?

arnab-photon commented 11 months ago

I'm running into this issue too. Build is successful and I can see the libth_transformer.so file in build/lib folder, but I'm not sure why it's not detecting it. I am running bart translate_example.py. I have tried multiple times and have provide both relative and absolute path, still nothing.

Env: pytorch - 1.13.0, cuda version - 11.8, RTX 3080 Ti Same issue on the g5 instances of AWS too

OSError: /workspace/FasterTransformer/build/lib/libth_transfomer.so: cannot open shared object file: No such file or directory

Can anyone please help, Been stuck at it for a long time

vuuihc commented 11 months ago

maybe you need build the fasterTransformers with -DBUILD_PYT=ON, eg: cmake -DSM=80 -DCMAKE_BUILD_TYPE=Release -DBUILD_MULTI_GPU=ON -DBUILD_PYT=ON ..

HeyDavid633 commented 6 months ago

@arnab-photon

Wish this can be helpful ! :)