aqlaboratory / rgn2

101 stars 29 forks source link

Local Setup Struggles #6

Closed menhart2 closed 1 year ago

menhart2 commented 1 year ago

Hello, I am trying to set up RGN2 locally to run hundreds of sequences. After setting up conda environments as illustrated in the Colab notebook, I am continually getting an error when trying to run the aminobert_predict_sequence() function. The same error occurs even when running the aminobert docker image. The error is as follows:

ERROR:tensorflow:Error recorded from prediction_loop: 2 root error(s) found. (0) Internal: Blas xGEMMBatched launch failed : a.shape=[12,1024,64], b.shape=[12,1024,64], m=1024, n=1024, k=64, batch_size=12 [[node bert/encoder/layer_0/attention/self/MatMul (defined at home/menhart/programs.installed/miniconda3/envs/rgn2/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]] [[Sum/_1265]] (1) Internal: Blas xGEMMBatched launch failed : a.shape=[12,1024,64], b.shape=[12,1024,64], m=1024, n=1024, k=64, batch_size=12 [[node bert/encoder/layer_0/attention/self/MatMul (defined at home/menhart/programs.installed/miniconda3/envs/rgn2/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]] 0 successful operations. 0 derived errors ignored.

I have seen some suggestions that it is a memory issue but after limiting the memory usage appropriately I still receive the same error. Can you point me towards what the real issue is, or perhaps suggest another approach to running many sequence predictions?

Thank you

christinaflo commented 1 year ago

Hi, can you check if it works with one sequence?

To run it for multiple the sequences, you need to make a few changes: Create a directory with your input fasta files and use the parse_fastas method followed by the aminobert_predict method in rgn2/aminobert/prediction.py, as opposed to aminobert_predict_sequence which is used in the notebook. The remainder of the workflow is the same.

menhart2 commented 1 year ago

Hi, Thank you for your reply. No, it does not work for a single sequence. I have been attempting to set it up locally and run it just like the test with the same string AA sequence as shown in the example, but I cannot get it to run at all. It always errors out at the aminobert_predict_sequence function with the error I reported in the github. I am beginning to suspect it is a compatibility problem with tensorflow and the RTx 3080 Ti GPU we use. I am new to working with GPUs, does this sound plausible to you? Any advice is appreciated.

Thank you, Mary


From: Christina Floristean @.> Sent: Friday, January 13, 2023 8:58 AM To: aqlaboratory/rgn2 @.> Cc: Menhart, Mary H @.>; Author @.> Subject: Re: [aqlaboratory/rgn2] Local Setup Struggles (Issue #6)

Hi, can you check if it works with one sequence?

To run it for multiple the sequences, you need to make a few changes: Create a directory with your input fasta files and use the parse_fastas method followed by the aminobert_predict method in rgn2/aminobert/prediction.py, as opposed to aminobert_predict_sequence which is used in the notebook. The remainder of the workflow is the same.

— Reply to this email directly, view it on GitHubhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Faqlaboratory%2Frgn2%2Fissues%2F6%23issuecomment-1381982094&data=05%7C01%7Cmenhart2%40uic.edu%7Ca1da2df62b4548e08a1308daf5769bf1%7Ce202cd477a564baa99e3e3b71a7c77dd%7C0%7C0%7C638092187026007669%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=owF0E1O6KCMIr1SkUubUt9DhRl02tJugMSfF9sLnPAU%3D&reserved=0, or unsubscribehttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FA5F2NQ6T6MGECKXIFCT2GLLWSFUQVANCNFSM6AAAAAATXGGP5U&data=05%7C01%7Cmenhart2%40uic.edu%7Ca1da2df62b4548e08a1308daf5769bf1%7Ce202cd477a564baa99e3e3b71a7c77dd%7C0%7C0%7C638092187026007669%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=yBfp2Xp%2FVPCUnJVzmBFXsrW1NsycnVsSIhfCDw715V8%3D&reserved=0. You are receiving this because you authored the thread.Message ID: @.***>

christinaflo commented 1 year ago

Yeah it is a GPU problem, it is not compatible with CUDA 10 / TF 1.15. I found this source which shows you how to get TF 1.15 working with a RTx 3080 Ti GPU.