✍️ Contribution period: <Zakia Yahya>

ersilia-os / ersilia

The Ersilia Model Hub, a repository of AI/ML models for infectious and neglected disease research.

https://ersilia.io

GNU General Public License v3.0

189 stars 123 forks source link

✍️ Contribution period: <Zakia Yahya> #616

Closed ZakiaYahya closed 1 year ago

ZakiaYahya commented 1 year ago

Week 1 - Get to know the community

[X] Join the communication channels
[X] Open a GitHub issue (this one!)
[x] Install the Ersilia Model Hub and test the simplest model
[x] Write a motivation statement to work at Ersilia
[x] Submit your first contribution to the Outreachy site

Week 2 - Install and run an ML model

[x] Select a model from the suggested list
[x] Install the model in your system
[x] Run predictions for the EML
[x] Compare results with the Ersilia Model Hub implementation!

Week 3 - Propose new models

[x] Suggest a new model and document it (1)
[x] Suggest a new model and document it (2)
[x] Suggest a new model and document it (3)

Week 4 - Prepare your final application

[x] Submit the final application in the Outreachy website

masroor07 commented 1 year ago

Hi @ZakiaYahya The model is not able to load, did you download the right model manually and added it to the folder as per the instructions? Pretrained models need to be manually downloaded

Alright @GemmaTuron, no i didn't downloaded it manually, i used "git clone --recursive https://github.com/ncats/ncats-adme.git" command to download the repo. I just downloaded the ncats-adme model from repo manually, now trying to create the environment which takes upto 5-6 hours, once it done i will let you know. Thanks.

Ncats-adme has various models. In your case the solubility based model. You need to download the model using NCATS-ADME itself. Try going through app.py and figure out the model that you are working with and the code related to it. But first, try to figure out how to solve the environment problem.

masroor07 commented 1 year ago

@pauline-banye can you check the issue with .csv files mentioned by @ZakiaYahya above?

Yes please check it @pauline, it's not working on CLI, i'm trying to run it on Colab

Hello @GemmaTuron and @pauline-banye, model NCAT solubility based ersilia eos74bo is working perfectly fine on Colab, it gave me output too, i'm attaching the output file here eos74bo_ersilia_colab.csv but it continuously giving me error when pass whole csv file while executing "run" api. Although, it works fine too when passing single smile string on CLI.

Oh I'm sorry for the late response, just saw your message. could you share the command you're running? Did it fetch successfully?I am currently testing the model as well. Could you also share the log file of the error you received?

Yes sure @pauline, yes it fetch and serve successfully after granting user priviledge. I'm running this command on CLI "ersilia -v api run -i "eml_canonical.csv" -o "eos74bo_output.csv" > eos74bo_log.log 2>&1" and it's giving me error: "TypeError: object of type 'float' has no len()" . I'm attaching the Error log here too eos74bo_log.log. Kindly have a look, Thanks.

could you try removing the quotations and try this “ersilia -v api run -i eml_canonical.csv -o eos74bo_output.cav”

masroor07 commented 1 year ago

@pauline-banye can you check the issue with .csv files mentioned by @ZakiaYahya above?

Yes please check it @pauline, it's not working on CLI, i'm trying to run it on Colab

Hello @GemmaTuron and @pauline-banye, model NCAT solubility based ersilia eos74bo is working perfectly fine on Colab, it gave me output too, i'm attaching the output file here eos74bo_ersilia_colab.csv but it continuously giving me error when pass whole csv file while executing "run" api. Although, it works fine too when passing single smile string on CLI.

Oh I'm sorry for the late response, just saw your message. could you share the command you're running? Did it fetch successfully?I am currently testing the model as well. Could you also share the log file of the error you received?

Yes sure @pauline, yes it fetch and serve successfully after granting user priviledge. I'm running this command on CLI "ersilia -v api run -i "eml_canonical.csv" -o "eos74bo_output.csv" > eos74bo_log.log 2>&1" and it's giving me error: "TypeError: object of type 'float' has no len()" . I'm attaching the Error log here too eos74bo_log.log. Kindly have a look, Thanks.

could you try removing the quotations and try this “ersilia -v api run -i eml_canonical.csv -o eos74bo_output.cav”

It probably interprets the argument passed as Smile string

pauline-banye commented 1 year ago

@pauline-banye can you check the issue with .csv files mentioned by @ZakiaYahya above?

Yes please check it @pauline, it's not working on CLI, i'm trying to run it on Colab

Hello @GemmaTuron and @pauline-banye, model NCAT solubility based ersilia eos74bo is working perfectly fine on Colab, it gave me output too, i'm attaching the output file here eos74bo_ersilia_colab.csv but it continuously giving me error when pass whole csv file while executing "run" api. Although, it works fine too when passing single smile string on CLI.

Oh I'm sorry for the late response, just saw your message. could you share the command you're running?

Did it fetch successfully?I am currently testing the model as well.

Could you also share the log file of the error you received?

Yes sure @pauline, yes it fetch and serve successfully after granting user priviledge. I'm running this command on CLI "ersilia -v api run -i "eml_canonical.csv" -o "eos74bo_output.csv" > eos74bo_log.log 2>&1" and it's giving me error: "TypeError: object of type 'float' has no len()" . I'm attaching the Error log here too eos74bo_log.log. Kindly have a look, Thanks.

Hi @ZakiaYahya Try

ersilia -v api run -i eml_canonical.csv -o eos74bo_output.csv

You don't need the quotes, your code is interpreting eml_canonical as a string.

ZakiaYahya commented 1 year ago

ersilia -v api run -i eml_canonical.csv -o eos74bo_output.csv

@pauline-banye , it didn't make any change, i have run without quotes as well but same error occurred.

ZakiaYahya commented 1 year ago

@pauline-banye can you check the issue with .csv files mentioned by @ZakiaYahya above?

Yes please check it @pauline, it's not working on CLI, i'm trying to run it on Colab

Hello @GemmaTuron and @pauline-banye, model NCAT solubility based ersilia eos74bo is working perfectly fine on Colab, it gave me output too, i'm attaching the output file here eos74bo_ersilia_colab.csv but it continuously giving me error when pass whole csv file while executing "run" api. Although, it works fine too when passing single smile string on CLI.

Oh I'm sorry for the late response, just saw your message. could you share the command you're running? Did it fetch successfully?I am currently testing the model as well. Could you also share the log file of the error you received?

Yes sure @pauline, yes it fetch and serve successfully after granting user priviledge. I'm running this command on CLI "ersilia -v api run -i "eml_canonical.csv" -o "eos74bo_output.csv" > eos74bo_log.log 2>&1" and it's giving me error: "TypeError: object of type 'float' has no len()" . I'm attaching the Error log here too eos74bo_log.log. Kindly have a look, Thanks.

Hi @ZakiaYahya Try
ersilia -v api run -i eml_canonical.csv -o eos74bo_output.csv
You don't need the quotes, your code is interpreting eml_canonical as a string.

Hello @GemmaTuron and @pauline-banye. Strangly it works and giving me output perfectly when ommit -o (output flag) from the command like this ersilia -v api run -i eml_canonical.csv and it's perfectly giving me output for each smile string from EML file and just pasting it on console. I think there is some kind of problem when it tries to save it in output csv file. Still trying. If you have any idea what causes this problem kindly let me know. Thanks.

GemmaTuron commented 1 year ago

Hi @Zainab-ik

Could it be you are not in the right folder? When you pass a .csv file, you need to either run the command from the folder where the file is or specify the full path, for example: ersilia -v api run -i ../data/eml_canonical.csv -o ../data/eos74bo_output.csv

ZakiaYahya commented 1 year ago

Hi @Zainab-ik

Could it be you are not in the right folder? When you pass a .csv file, you need to either run the command from the folder where the file is or specify the full path, for example: ersilia -v api run -i ../data/eml_canonical.csv -o ../data/eos74bo_output.csv

Hi @GemmaTuron, it takes the EML file succesfully, plz can you see my earlier comment, it's working fine with EML file too when executing it without specifying the output file for storing it. It printed the output for all smile strings on console that means it reading the file and taking the smile strings perfectly. Th issue occurs somewhere with the output csv file for storing. Any idea? Thanks.

ZakiaYahya commented 1 year ago

Hi @Zainab-ik

Could it be you are not in the right folder? When you pass a .csv file, you need to either run the command from the folder where the file is or specify the full path, for example: ersilia -v api run -i ../data/eml_canonical.csv -o ../data/eos74bo_output.csv

@GemmaTuron and @pauline-banye i have tried storing the result output in the json file like this ersilia -v api run -i eml_canonical.csv -o out.json and it stored successfully. but when i tried to save the output in csv file like this ersilia -v api run -i eml_canonical.csv -o out.json it gave me error "TypeError: object of type 'float' has no len()"

ZakiaYahya commented 1 year ago

Hi @ZakiaYahya

The model is not able to load, did you download the right model manually and added it to the folder as per the instructions? Pretrained models need to be manually downloaded

Hello @GemmaTuron, i have now manually download NCAT repo and it still giving me that error, i did uninstall and re-install tensorflow too but no gain

2023-03-17 18:32:44.393808: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-03-17 18:32:44.620866: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory 2023-03-17 18:32:44.620968: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 2023-03-17 18:32:45.828112: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory 2023-03-17 18:32:45.828329: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory 2023-03-17 18:32:45.828405: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. Loading Solubility graph convolutional neural network model Traceback (most recent call last): File "app.py", line 23, in <module> from predictors.solubility.solubility_predictor import SolubilityPredictior File "/home/zakia/ncats-adme-new/server/predictors/solubility/__init__.py", line 21, in <module> solubility_gcnn_scaler, solubility_gcnn_model = load_gcnn_model(solubility_model_file_path, solubility_model_file_url) File "/home/zakia/ncats-adme-new/server/predictors/utilities/utilities.py", line 63, in load_gcnn_model gcnn_scaler, _ = load_scalers(model_file_path) File "/home/zakia/ncats-adme-new/server/./predictors/chemprop/chemprop/utils.py", line 132, in load_scalers state = torch.load(path, map_location=lambda storage, loc: storage) File "/home/zakia/ncats-adme-new/server/env/lib/python3.8/site-packages/torch/serialization.py", line 585, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File "/home/zakia/ncats-adme-new/server/env/lib/python3.8/site-packages/torch/serialization.py", line 755, in _legacy_load magic_number = pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, '<'.

Everyone on different forums suggested to download manually which i did but still i'm getting that error.

masroor07 commented 1 year ago

Hi @ZakiaYahya The model is not able to load, did you download the right model manually and added it to the folder as per the instructions? Pretrained models need to be manually downloaded

Hello @GemmaTuron, i have now manually download NCAT repo and it still giving me that error, i did uninstall and re-install tensorflow too but no gain

2023-03-17 18:32:44.393808: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-03-17 18:32:44.620866: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory 2023-03-17 18:32:44.620968: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 2023-03-17 18:32:45.828112: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory 2023-03-17 18:32:45.828329: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory 2023-03-17 18:32:45.828405: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. Loading Solubility graph convolutional neural network model Traceback (most recent call last): File "app.py", line 23, in <module> from predictors.solubility.solubility_predictor import SolubilityPredictior File "/home/zakia/ncats-adme-new/server/predictors/solubility/__init__.py", line 21, in <module> solubility_gcnn_scaler, solubility_gcnn_model = load_gcnn_model(solubility_model_file_path, solubility_model_file_url) File "/home/zakia/ncats-adme-new/server/predictors/utilities/utilities.py", line 63, in load_gcnn_model gcnn_scaler, _ = load_scalers(model_file_path) File "/home/zakia/ncats-adme-new/server/./predictors/chemprop/chemprop/utils.py", line 132, in load_scalers state = torch.load(path, map_location=lambda storage, loc: storage) File "/home/zakia/ncats-adme-new/server/env/lib/python3.8/site-packages/torch/serialization.py", line 585, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File "/home/zakia/ncats-adme-new/server/env/lib/python3.8/site-packages/torch/serialization.py", line 755, in _legacy_load magic_number = pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, '<'.

Everyone on different forums suggested to download manually which i did but still i'm getting that error.

masroor07 commented 1 year ago

Hi @ZakiaYahya The model is not able to load, did you download the right model manually and added it to the folder as per the instructions? Pretrained models need to be manually downloaded

Hello @GemmaTuron, i have now manually download NCAT repo and it still giving me that error, i did uninstall and re-install tensorflow too but no gain 2023-03-17 18:32:44.393808: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-03-17 18:32:44.620866: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory 2023-03-17 18:32:44.620968: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 2023-03-17 18:32:45.828112: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory 2023-03-17 18:32:45.828329: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory 2023-03-17 18:32:45.828405: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. Loading Solubility graph convolutional neural network model Traceback (most recent call last): File "app.py", line 23, in <module> from predictors.solubility.solubility_predictor import SolubilityPredictior File "/home/zakia/ncats-adme-new/server/predictors/solubility/__init__.py", line 21, in <module> solubility_gcnn_scaler, solubility_gcnn_model = load_gcnn_model(solubility_model_file_path, solubility_model_file_url) File "/home/zakia/ncats-adme-new/server/predictors/utilities/utilities.py", line 63, in load_gcnn_model gcnn_scaler, _ = load_scalers(model_file_path) File "/home/zakia/ncats-adme-new/server/./predictors/chemprop/chemprop/utils.py", line 132, in load_scalers state = torch.load(path, map_location=lambda storage, loc: storage) File "/home/zakia/ncats-adme-new/server/env/lib/python3.8/site-packages/torch/serialization.py", line 585, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File "/home/zakia/ncats-adme-new/server/env/lib/python3.8/site-packages/torch/serialization.py", line 755, in _legacy_load magic_number = pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, '<'. Everyone on different forums suggested to download manually which i did but still i'm getting that error.

Try installing a particular version for tensorflow.

ZakiaYahya commented 1 year ago

Hi @ZakiaYahya The model is not able to load, did you download the right model manually and added it to the folder as per the instructions? Pretrained models need to be manually downloaded

Hello @GemmaTuron, i have now manually download NCAT repo and it still giving me that error, i did uninstall and re-install tensorflow too but no gain 2023-03-17 18:32:44.393808: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-03-17 18:32:44.620866: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory 2023-03-17 18:32:44.620968: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 2023-03-17 18:32:45.828112: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory 2023-03-17 18:32:45.828329: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory 2023-03-17 18:32:45.828405: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. Loading Solubility graph convolutional neural network model Traceback (most recent call last): File "app.py", line 23, in <module> from predictors.solubility.solubility_predictor import SolubilityPredictior File "/home/zakia/ncats-adme-new/server/predictors/solubility/__init__.py", line 21, in <module> solubility_gcnn_scaler, solubility_gcnn_model = load_gcnn_model(solubility_model_file_path, solubility_model_file_url) File "/home/zakia/ncats-adme-new/server/predictors/utilities/utilities.py", line 63, in load_gcnn_model gcnn_scaler, _ = load_scalers(model_file_path) File "/home/zakia/ncats-adme-new/server/./predictors/chemprop/chemprop/utils.py", line 132, in load_scalers state = torch.load(path, map_location=lambda storage, loc: storage) File "/home/zakia/ncats-adme-new/server/env/lib/python3.8/site-packages/torch/serialization.py", line 585, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File "/home/zakia/ncats-adme-new/server/env/lib/python3.8/site-packages/torch/serialization.py", line 755, in _legacy_load magic_number = pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, '<'. Everyone on different forums suggested to download manually which i did but still i'm getting that error.

Try installing a particular version for tensorflow.

Hello @masroor07 , this not a problem with any package or dependencies. i have gone through alot of forums discussion and this error specifically occurs when the model file is not downloaded properly or it is corrupted or their is some problem with the model weight file. So, i have now downloaded the solubility model file from [](https://opendata.ncats.nih.gov/public/adme/models/archived/solubility/) and now the code is running, not giving me **pickle_load_args) _pickle.UnpicklingError: invalid load key, '<'. anymore, let see if this model file works or not.

masroor07 commented 1 year ago

Hi @ZakiaYahya The model is not able to load, did you download the right model manually and added it to the folder as per the instructions? Pretrained models need to be manually downloaded

Hello @GemmaTuron, i have now manually download NCAT repo and it still giving me that error, i did uninstall and re-install tensorflow too but no gain 2023-03-17 18:32:44.393808: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-03-17 18:32:44.620866: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory 2023-03-17 18:32:44.620968: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 2023-03-17 18:32:45.828112: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory 2023-03-17 18:32:45.828329: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory 2023-03-17 18:32:45.828405: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. Loading Solubility graph convolutional neural network model Traceback (most recent call last): File "app.py", line 23, in <module> from predictors.solubility.solubility_predictor import SolubilityPredictior File "/home/zakia/ncats-adme-new/server/predictors/solubility/__init__.py", line 21, in <module> solubility_gcnn_scaler, solubility_gcnn_model = load_gcnn_model(solubility_model_file_path, solubility_model_file_url) File "/home/zakia/ncats-adme-new/server/predictors/utilities/utilities.py", line 63, in load_gcnn_model gcnn_scaler, _ = load_scalers(model_file_path) File "/home/zakia/ncats-adme-new/server/./predictors/chemprop/chemprop/utils.py", line 132, in load_scalers state = torch.load(path, map_location=lambda storage, loc: storage) File "/home/zakia/ncats-adme-new/server/env/lib/python3.8/site-packages/torch/serialization.py", line 585, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File "/home/zakia/ncats-adme-new/server/env/lib/python3.8/site-packages/torch/serialization.py", line 755, in _legacy_load magic_number = pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, '<'. Everyone on different forums suggested to download manually which i did but still i'm getting that error.

Try installing a particular version for tensorflow.

Hello @masroor07 , this not a problem with any package or dependencies. i have gone through alot of forums discussion and this error specifically occurs when the model file is not downloaded properly or it is corrupted or their is some problem with the model weight file. So, i have now downloaded the solubility model file from and now the code is running, not giving me **pickle_load_args) _pickle.UnpicklingError: invalid load key, '<'. anymore, let see if this model file works or not.

Alright! is it working fine?

Pradnya2203 commented 1 year ago

Hey @ZakiaYahya, I was facing a similar UnpicklingError and I solved by removing corrupt files from the models directory. You can try that as well.

ZakiaYahya commented 1 year ago

Hey @ZakiaYahya, I was facing a similar UnpicklingError and I solved by removing corrupt files from the models directory. You can try that as well.

Hello @Pradnya2203, Thanks for letting me know. Yeh that error caused by corrupted files. I have manullay downloaded the model file from the https://opendata.ncats.nih.gov/adme/ website. Now it is not giving me any pickelingError but it seems like it hangs up while downloading files from server http://127.0.0.1:5000/. It stucks at Getting GET /client/assests/images/banner.png with status code 304 but it didn't proceed after that. Can you point out which files did you deleted?? Thanks.

Pradnya2203 commented 1 year ago

I deleted gcnn_model.pt which was auto-downloaded and corrupted and was present in the models/rlm directory, I'm not sure why it hangs up there, could you provide more details?

ZakiaYahya commented 1 year ago

I deleted gcnn_model.pt which was auto-downloaded and corrupted and was present in the models/rlm directory, I'm not sure why it hangs up there, could you provide more details?

oh right @Pradnya2203, yes that file is corrupted, i downloaded that file fro solubility model from opendata website and it's not giving that error any more.

ZakiaYahya commented 1 year ago

Hello @GemmaTuron and @pauline-banye, I have managed to run python app.py on CLI by downloading correct version of solubility model but now from last 3 days i'm trying but whenever i run app.py it seems like it hangs there, i have restarted and run it again and again but experiencing the same thing. It didn't proceed after that,

127.0.0.1 - - [20/Mar/2023 08:58:23] "GET / HTTP/1.1" 200 - 127.0.0.1 - - [20/Mar/2023 08:58:23] "GET /client/runtime-es2015.4ecd957143422332d4bf.js HTTP/1.1" 304 - 127.0.0.1 - - [20/Mar/2023 08:58:23] "GET /client/polyfills-es2015.1ea406a531d69c32d225.js HTTP/1.1" 304 - 127.0.0.1 - - [20/Mar/2023 08:58:23] "GET /client/main-es2015.0202f1e592c4d08d6334.js HTTP/1.1" 304 - 127.0.0.1 - - [20/Mar/2023 08:58:23] "GET /client/styles.2f98c1b7357ecd32411f.css HTTP/1.1" 304 - 127.0.0.1 - - [20/Mar/2023 08:58:23] "GET /client/assets/data/config.json HTTP/1.1" 304 - 127.0.0.1 - - [20/Mar/2023 08:58:23] "GET /client/assets/images/banner.png HTTP/1.1" 304 - assets/icons/favicon.ico 127.0.0.1 - - [20/Mar/2023 08:58:23] "GET /assets/icons/favicon.ico HTTP/1.1" 200 - 127.0.0.1 - - [20/Mar/2023 08:58:23] "GET /client/assets/icons/favicon.ico HTTP/1.1" 304 -

Here i'm attaching whole log app_log.txt

Any idea why it hangs there, i dig into the code as well but didn't find any problem there. Thanks.

pauline-banye commented 1 year ago

Hello @GemmaTuron and @pauline-banye, I have managed to run python app.py on CLI by downloading correct version of solubility model but now from last 3 days i'm trying but whenever i run app.py it seems like it hangs there, i have restarted and run it again and again but experiencing the same thing. It didn't proceed after that,

127.0.0.1 - - [20/Mar/2023 08:58:23] "GET / HTTP/1.1" 200 - 127.0.0.1 - - [20/Mar/2023 08:58:23] "GET /client/runtime-es2015.4ecd957143422332d4bf.js HTTP/1.1" 304 - 127.0.0.1 - - [20/Mar/2023 08:58:23] "GET /client/polyfills-es2015.1ea406a531d69c32d225.js HTTP/1.1" 304 - 127.0.0.1 - - [20/Mar/2023 08:58:23] "GET /client/main-es2015.0202f1e592c4d08d6334.js HTTP/1.1" 304 - 127.0.0.1 - - [20/Mar/2023 08:58:23] "GET /client/styles.2f98c1b7357ecd32411f.css HTTP/1.1" 304 - 127.0.0.1 - - [20/Mar/2023 08:58:23] "GET /client/assets/data/config.json HTTP/1.1" 304 - 127.0.0.1 - - [20/Mar/2023 08:58:23] "GET /client/assets/images/banner.png HTTP/1.1" 304 - assets/icons/favicon.ico 127.0.0.1 - - [20/Mar/2023 08:58:23] "GET /assets/icons/favicon.ico HTTP/1.1" 200 - 127.0.0.1 - - [20/Mar/2023 08:58:23] "GET /client/assets/icons/favicon.ico HTTP/1.1" 304 -

Any idea why it hangs there, i dig into the code as well but didn't find any problem there. Thanks.

Hi @ZakiaYahya I have not come across this error before but clear your terminal cache then try again.

GemmaTuron commented 1 year ago

Hi @ZakiaYahya

Thanks for the hard work, good job on testing. As @Pradnya2203 points out, the model files seem to be corrupted, which means we need to download them manually and add to the folder - I think you got that right with the solubility model

For the error you are reporting - I see the tensorflow is trying to get NVIDIA GPUs, which you won't have in your system, probably you need to install a cpu version of tensorflow: TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. -- this causes the model loading errors you find.

at this moment though, I am more concerned about the implementation in the Ersilia Model Hub which you reported gave some errors, can we go back to this? I am tagging @pauline-banye to keep her in the loop.

As next steps for you @ZakiaYahya :

Run the eos74bo from the Ersilia Model Hub and let us know what is the error you ar egtting (sorry, I know you did, can you confirm once again)
Focus on Week 3 tasks next

ZakiaYahya commented 1 year ago

Hi @ZakiaYahya

Thanks for the hard work, good job on testing. As @Pradnya2203 points out, the model files seem to be corrupted, which means we need to download them manually and add to the folder - I think you got that right with the solubility model

For the error you are reporting - I see the tensorflow is trying to get NVIDIA GPUs, which you won't have in your system, probably you need to install a cpu version of tensorflow: TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. -- this causes the model loading errors you find.

at this moment though, I am more concerned about the implementation in the Ersilia Model Hub which you reported gave some errors, can we go back to this? I am tagging @pauline-banye to keep her in the loop.

As next steps for you @ZakiaYahya :

Run the eos74bo from the Ersilia Model Hub and let us know what is the error you ar egtting (sorry, I know you did, can you confirm once again)

Focus on Week 3 tasks next

Hello @GemmaTuron,

(1) NCATS-Solubility model: The model is working. I have successfully ran the NCATS-ADME model for solubility model, which i did 3 days ago. I thought that prediction file created on my local system in "test_data_dir" because it already comprises predictions file that's why , That's why i keep checking it. I already did predictions on EML at https://opendata.ncats.nih.gov/adme/predictions 3 days ago. I just missunderstood that prediction file is created on local server instead of local system (sorry, i know this is funny). Big relief. I'm attaching the prediction file for Solubility model ADME_Predictions_2023-03-19-194125.csv

(2) Ersilia eos74bo model CLI: The model is fetch and serve without any problem. It's also giving predictions as well but when i tries to store the predictions in output csv file it throws an error. Then, i tried storing the result output in the json file like this ersilia -v api run -i eml_canonical.csv -o out.json and it stored successfully in json file, but when i tried to save the output in csv file like this ersilia -v api run -i eml_canonical.csv -o out.csv it gave me error TypeError: object of type 'float' has no len()

I'm not able to attach json file here as github not supporting this file extension.

(3) Ersilia eos74bo model COLAB: I have succesfully run ersilia model eos74bo on colab. here's the ouput file eos74bo_ersilia_colab.csv

I'll start working on "WEEK 3" tasks, and side by side i'll check by running eos74bo model on CLI what's the problem, why it's not storing predictions in csv format. Secondly, for now i'll compare the results of eos74bo i got from COLAB with results of NCATS-solubility from https://opendata.ncats.nih.gov/adme/predictions, so that i can mark my WEEK 2 task "Compare results with the Ersilia Model Hub implementation!" complete.

Thanks.

GemmaTuron commented 1 year ago

Hi @ZakiaYahya !

Thanks for this! @pauline-banye can you have a look at the issue with .csv files?

ZakiaYahya commented 1 year ago

Hello @GemmaTuron, WEEK-2 : Compare results with the Ersilia Model Hub implementation!

The major challenge in designing oral dosage forms lies in their poor bioavailability. The most frequent causes of low oral bioavailability are attributed to poor solubility and low permeability. Solubility is one of the important parameters to achieve desired concentration of drug in systemic circulation for achieving required pharmacological response. Drugs that have poor aqueous solubility have a slower drug absorption rate, which can lead to inadequate and variable bioavailability, and render the drug ineffective. If the solubility is known, then solutions can easily be made to the correct concentrations. We are checking the solubility of a chemical compound using NCATS-ADME solubility model and Ersilia aqueous kinetic model with slug eos74bo.

Lets first understand the output predictions returned by two models i.e. NCATS-ADME solubilty model and Ersilia eos74bo. Both models are classification model returns either compound as a highly soluble or as a low soluble. Using NCATS-ADME solubility model, Active compounds are compounds with Moderate or High phenotype belonging to class = 0 while Inactive compounds are those with low phenotype belonging to the class = 1. Ersilia model eos74bo returns the probability of a compound having poor solubility i.e. >=0.5: low solubility, and probability of a compound being highly soluble i.e. <0.5: high solubility. I'm comparing these models for few smile strings here to just check whether the two models classify the that smile string into the same category or not.

Let's first take smile string i.e. Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1, the NCAT-ADME classify this compound as highly soluble compound while the probability returned by Ersilia eos74bo is 0.0007697788 that is less than 0.5 which means it's a highly soluble compound.

Let's take another smile string to check the coincidence between the classification results returned by two models. Here we take smile string CCCSc1ccc2nc(NC(=O)OC)[nH]c2c1,the NCATS-ADME classify this compound with low solubility while the probability returned by ersilia model eos74bo is 0.99345654 that is greater than 0.5, which means it's a low soluble compound.

Few more comparisons: Strings --------------------------------------->NCATS-ADME Solubility Model--------- ----------->Ersilia eos74bo O=c1ncnc2[nH][nH]cc1-2 --------------------------> low solubility---------------------------->0.9541449 (low solubility) C[C@@](Cc1ccc(O)c(O)c1)(NN)C(=O)O -----------> high solubility---------------------------->0.0004001 (high solubility) Cc1ccc(Cl)c(O)c1C ----------------------------------> low solubility----------------------------->0.9934958 (low solubility) NCCc1ccc(O)c(O)c1 --------------------------------> high solubility --------------------------->0.0018976 (high solubility)

Thanks.

ZakiaYahya commented 1 year ago

Week 3 - Proposed Model:1

Model Name DEEPScreen: Virtual Screening with Deep Convolutional Neural Networks Using Compound Images

Model Description DEEPScreen is a large-scale DTI prediction system, for early stage drug discovery, using deep convolutional neural networks. One of the main advantages of DEEPScreen is employing readily available 2-D structural representations of compounds at the input level instead of conventional descriptors that display limited performance. DEEPScreen learns complex features inherently from the 2-D representations, thus producing highly accurate predictions.

Model Summary DEEPScreen is a collection of DCNNs, each of which is an individual predictor for a target protein. The system takes drugs or drug candidate compounds in the form of SMILES representations as query, generates 200-by-200 pixel 2-D structural/molecular images using SMILES, runs the predictive DCNN models on the input 2-D images, and generates binary predictions as active (i.e., interacting) or inactive (i.e.,non-interacting) for the corresponding target protein.

Slug Drug-target-interaction-prediction

Tag Drug discovery, convolutional neural networks

Task Classification

Package Dependencies: pandas==1.0.4 opencv-python==4.2.0.34 torch===1.5.0 scipy==1.5.0rc2 scikit-learn==0.23.1 cairosvg==2.4.2

Publication https://pubs.rsc.org/en/content/articlelanding/2020/SC/C9SC03414E

Supplementary Information PDF is attached below for more information about DEEPScreen DEEPScreen PDF.pdf

Github Respository: Source Code https://github.com/cansyl/DEEPScreen

License GNU General Public License https://www.gnu.org/licenses/

ZakiaYahya commented 1 year ago

Week 3 - Proposed Model: 2

Model Name MolTrans: Molecular Interaction Transformer for Drug Target Interaction Prediction

Model Description In drug discovery domain, Drug target interaction (DTI) prediction is a foundational task which is costly and time-consuming due to the need of its extensive experimental search over large drug dataset. Previously proposed methods give results that are less accurate and difficult to explain due to these two main reasons; (1) it ignores sub-structural nature of DTI and (2) it only focuses on labeled data and ignorned massive unlabelled molecular data.

Model Summary Molecular Interaction Transformer (MolTrans) gives more accurate results as compared to other baselines by incorporating pattern mining algorithm and interaction modeling module which uses sub-structural pattern for more accurate and interpretable DTI prediction. Secondly, it incorporates an augmented transformer encoder to better extract and capture the semantic relations among substructures extracted from massive unlabeled biomedical data while existing methods focuses on limited labeled data and ignored massive unlabelled molecular data. MolTrans is evaluated on real world data and it showed improved DTI prediction performance compared to state-of-the-art baselines. MoITrans is basically a classfication model that determine whether a pair of drug and target protein will interact.

Datasets BindingDB (https:// www.bindingdb.org/bind/index.jsp) DAVIS (http://staff.cs.utu.fi/ ~aatapa/data/DrugTarget/) BIOSNAP (http://snap.stanford.edu/biodata/datasets/10002/10002ChG-Miner.html)

Slug Drug–target protein interaction

Tag DTI prediction, data mining, labelled and unlabelled data

Task Classification

Package Dependencies: numpy pandas tqdm scikit-learn torch subword-nmt

Publication https://academic.oup.com/bioinformatics/article/37/6/830/5929692?login=false

Supplementary Information PDF is attached below for more information about MolTrans MolTrans.pdf

Github Respository: Source Code https://github.com/kexinhuang12345/moltrans.

License BSD 3-Clause

GemmaTuron commented 1 year ago

Let's first take smile string i.e. Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1, the NCAT-ADME classify this compound as highly soluble compound while the probability returned by Ersilia eos74bo is 0.0007697788 that is less than 0.5 which means it's a highly soluble compound.

Hi @ZakiaYahya

Very detailed explanation, good job many thanks!

GemmaTuron commented 1 year ago

For the models:

DeepScreen Good suggestion, very relevant to our work, it is actually already in our list of suggestions! We have not yet implemented it because they do not provide the model checkpoints, we rather have to train them, which can be time consuming - that's why its not prioritised in our pipeline

MolTrans The main concern again is that the checkpoints are not provided, and typically these screenings of drug target interaction are very large - but let's add it to the list of model suggestions!

For the last model, could you focus on trying to find a model that predicts bioactivity of small molecules against a pathogen for example?

ZakiaYahya commented 1 year ago

Let's first take smile string i.e. Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1, the NCAT-ADME classify this compound as highly soluble compound while the probability returned by Ersilia eos74bo is 0.0007697788 that is less than 0.5 which means it's a highly soluble compound.

Hi @ZakiaYahya

Very detailed explanation, good job many thanks!

Thanks @GemmaTuron .

ZakiaYahya commented 1 year ago

For the models:

DeepScreen Good suggestion, very relevant to our work, it is actually already in our list of suggestions! We have not yet implemented it because they do not provide the model checkpoints, we rather have to train them, which can be time consuming - that's why its not prioritised in our pipeline

MolTrans The main concern again is that the checkpoints are not provided, and typically these screenings of drug target interaction are very large - but let's add it to the list of model suggestions!

For the last model, could you focus on trying to find a model that predicts bioactivity of small molecules against a pathogen for example?

Alright @GemmaTuron i'll find third model on the suggested topic. Also i added the MolTrans in Ersilia Model Suggestion list. Thanks.

pauline-banye commented 1 year ago

Hi @ZakiaYahya

Thanks for the hard work, good job on testing. As @Pradnya2203 points out, the model files seem to be corrupted, which means we need to download them manually and add to the folder - I think you got that right with the solubility model

For the error you are reporting - I see the tensorflow is trying to get NVIDIA GPUs, which you won't have in your system, probably you need to install a cpu version of tensorflow: TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. -- this causes the model loading errors you find.

at this moment though, I am more concerned about the implementation in the Ersilia Model Hub which you reported gave some errors, can we go back to this? I am tagging @pauline-banye to keep her in the loop.

As next steps for you @ZakiaYahya :

Run the eos74bo from the Ersilia Model Hub and let us know what is the error you ar egtting (sorry, I know you did, can you confirm once again)

Focus on Week 3 tasks next

Hello @GemmaTuron,

(1) NCATS-Solubility model: The model is working. I have successfully ran the NCATS-ADME model for solubility model, which i did 3 days ago. I thought that prediction file created on my local system in "test_data_dir" because it already comprises predictions file that's why , That's why i keep checking it. I already did predictions on EML at https://opendata.ncats.nih.gov/adme/predictions 3 days ago. I just missunderstood that prediction file is created on local server instead of local system (sorry, i know this is funny). Big relief. I'm attaching the prediction file for Solubility model ADME_Predictions_2023-03-19-194125.csv

(2) Ersilia eos74bo model CLI: The model is fetch and serve without any problem. It's also giving predictions as well but when i tries to store the predictions in output csv file it throws an error. Then, i tried storing the result output in the json file like this ersilia -v api run -i eml_canonical.csv -o out.json and it stored successfully in json file, but when i tried to save the output in csv file like this ersilia -v api run -i eml_canonical.csv -o out.csv it gave me error TypeError: object of type 'float' has no len()

I'm not able to attach json file here as github not supporting this file extension.

(3) Ersilia eos74bo model COLAB: I have succesfully run ersilia model eos74bo on colab. here's the ouput file eos74bo_ersilia_colab.csv

I'll start working on "WEEK 3" tasks, and side by side i'll check by running eos74bo model on CLI what's the problem, why it's not storing predictions in csv format. Secondly, for now i'll compare the results of eos74bo i got from COLAB with results of NCATS-solubility from https://opendata.ncats.nih.gov/adme/predictions, so that i can mark my WEEK 2 task "Compare results with the Ersilia Model Hub implementation!" complete.

Thanks.

Hi @ZakiaYahya so sorry I haven't been following up as much. I have been a bit indisposed. Could you update me on the issues you were having with the csv files?

ZakiaYahya commented 1 year ago

Ersilia eos74bo model CLI: The model is fetch and serve without any problem. It's also giving predictions as well but when i tries to store the predictions in output csv file it throws an error. Then, i tried storing the result output in the json file like this ersilia -v api run -i eml_canonical.csv -o out.json and it stored successfully in json file, but when i tried to save the output in csv file like this ersilia -v api run -i eml_canonical.csv -o out.csv it gave me error TypeError: object of type 'float' has no len()

Hi @pauline-banye, eos74bo model fetched and served without any problem but when i tried to save the output in csv file like this ersilia -v api run -i eml_canonical.csv -o out.csv it gave me error TypeError: object of type 'float' has no len(). Thanks,

ZakiaYahya commented 1 year ago

Week 3 - Proposed Model: 3

Model Name REDIAL-2020: A machine learning platform to estimate anti-SARS-CoV-2 activities

Model Description At present, there is a pressing requirement to discover potent medications that can effectively treat coronavirus disease 2019 (COVID-19). To address this issue, REDIAL-2020 is introduced, a collection of machine learning models that predict the activities of live viral infectivity, viral entry, and viral replication for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), as well as in vitro infectivity and human cell toxicity. This tool could prove invaluable to the scientific community in identifying compounds for in vitro screening and expedite the identification of potential new drug candidates for treating COVID-19.

Model Summary REDIAL-2020 consists of eleven independently trained machine learning models and includes a similarity search module that queries the underlying experimental dataset for similar compounds. For this purpose, NCATS data associated with the aforementioned assays were downloaded from the COVID-19 portal. After mining, the compounds were labelled as positive or negative for each assay. The compounds with a low-activity class were treated as negative,whereas compounds with high- and moderate-activity classes were treated as positive.

Datasets NCATS COVID-19 (https://opendata.ncats.nih.gov/covid19/assays)

Slug Redial-2020

Tag Estimating small molecule activities, COVID-19, Live Virus Infectivity, Viral Entry, Viral Replication

Task Classification

Package Dependencies: numpy==1.19.2 hypopt==1.0.9 argparse==1.4.0 tqdm==4.49.0 flask==1.1.2 cairosvg==2.4.2 requests==2.24.0 pubchempy==1.0.4 func_timeout==4.3.5 xgboost==1.0.2 scikit-learn==0.22.1 pandas==1.1.2

Publication https://www.nature.com/articles/s42256-021-00335-w#Sec9

Supplementary Information PDF is attached below for more information about Redial-2020 REDIAL-2020.pdf Website: http://drugcentral.org/Redial

Github Respository: Source Code https://github.com/sirimullalab/redial-2020/tree/v1.0

License MIT License (https://github.com/sirimullalab/redial-2020/blob/master/LICENSE)

ZakiaYahya commented 1 year ago

Week 3 - Proposed Model: 3

Model Name REDIAL-2020: A machine learning platform to estimate anti-SARS-CoV-2 activities

Model Description At present, there is a pressing requirement to discover potent medications that can effectively treat coronavirus disease 2019 (COVID-19). To address this issue, REDIAL-2020 is introduced, a collection of machine learning models that predict the activities of live viral infectivity, viral entry, and viral replication for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), as well as in vitro infectivity and human cell toxicity. This tool could prove invaluable to the scientific community in identifying compounds for in vitro screening and expedite the identification of potential new drug candidates for treating COVID-19.

Model Summary REDIAL-2020 consists of eleven independently trained machine learning models and includes a similarity search module that queries the underlying experimental dataset for similar compounds. For this purpose, NCATS data associated with the aforementioned assays were downloaded from the COVID-19 portal. After mining, the compounds were labelled as positive or negative for each assay. The compounds with a low-activity class were treated as negative,whereas compounds with high- and moderate-activity classes were treated as positive.

Datasets NCATS COVID-19 (https://opendata.ncats.nih.gov/covid19/assays)

Slug Redial-2020

Tag Estimating small molecule activities, COVID-19, Live Virus Infectivity, Viral Entry, Viral Replication

Task Classification

Package Dependencies: numpy==1.19.2 hypopt==1.0.9 argparse==1.4.0 tqdm==4.49.0 flask==1.1.2 cairosvg==2.4.2 requests==2.24.0 pubchempy==1.0.4 func_timeout==4.3.5 xgboost==1.0.2 scikit-learn==0.22.1 pandas==1.1.2

Publication https://www.nature.com/articles/s42256-021-00335-w#Sec9

Supplementary Information PDF is attached below for more information about Redial-2020 REDIAL-2020.pdf Website: http://drugcentral.org/Redial

Github Respository: Source Code https://github.com/sirimullalab/redial-2020/tree/v1.0

License MIT License (https://github.com/sirimullalab/redial-2020/blob/master/LICENSE)

Hello @GemmaTuron, I just saw that someone else opened a Model Request for Redial-2020 that i have suggested as my model suggestion-3, should i have to propose another model or is it consider as my model suggestion too? Kindly let me know. Thanks.

ZakiaYahya commented 1 year ago

Week 3 - Proposed Model: 4 (I'm suggesting this model because i think Ersilia is interested in Natural products)

Model Name NPBERT-Antimalaria: Predicting Antimalarial Activity in Natural Products Using Pretrained Bidirectional Encoder Representations from Transformers

Model Description Malaria is a perilous illness that has caused significant fatalities and has a high occurrence rate every year. Over the past decade, numerous research studies have been conducted to discover effective antimalarial compounds to combat this disease. In addition to chemically synthesized chemicals, several natural compounds have also been demonstrated to be equally effective in their antimalarial properties. Experimental approaches to explore antimalarial activities in natural products have been complemented by computational methods, which have yielded satisfactory outcomes.

Model Summary This research paper introduces a new molecular encoding system based on Bidirectional Encoder Representations from Transformers and employs a pretrained encoding model named NPBERT in conjunction with four machine learning algorithms, namely k-Nearest Neighbors (k-NN), Support Vector Machines (SVM), eXtreme Gradient Boosting (XGB), and Random Forest (RF),to develop various prediction models to identify antimalarial natural products. The results show that SVM models are the best-performing classifiers, followed by the XGB, k-NN, and RF models. Additionally, comparative analysis between proposed molecular encoding scheme and existing state-of-the-art methods indicates that NPBERT is more effective compared to the others. Moreover, the deployment of transformers in constructing molecular encoders is not limited to this study but can be utilized for other biomedical applications. To create the NPBERT encoder, we used about 2 million compounds, collected from the ChEMBL and ZINC databases, as training data. For a fair assessment,trained classifiers using the NPBERT encoding scheme were compared with those using state-of-the-art methods, including 196-dimensional RDKit molecular descriptors, extendedconnectivity fingerprints, and the Mol2Vec encoding scheme.

Datasets ChEMBL database ZINC database

Slug NPBERT-Antimalaria

Tag Antimicrobial agents, antimalarial natural products, Molecular modeling

Task Classification

Publication https://pubs.acs.org/doi/full/10.1021/acs.jcim.1c00584

Supplementary Information PDF is attached below for more information about NPBERT-Antimalaria NPBERT-Antimalaria.pdf

Github Respository: Source Code https://github.com/mldlproject/2021-NPBERT-Antimalaria

License None

GemmaTuron commented 1 year ago

Hi @ZakiaYahya !

Good job on those models, REDIAL has already been suggested and incorporated in the list, and the antimalarial one is also very interesting to us! Would you

Incorporate the antimalarial model in our model suggestion list?
Try it out to see if it is easy to run and list the steps here

ZakiaYahya commented 1 year ago

Hi @ZakiaYahya !

Good job on those models, REDIAL has already been suggested and incorporated in the list, and the antimalarial one is also very interesting to us! Would you

Incorporate the antimalarial model in our model suggestion list?

Try it out to see if it is easy to run and list the steps here

Hello @GemmaTuron, Yeah sure, i'll incorporate the antimalarial model into the suggestion list, and i'll try to run it and will let you know. Thanks.

ZakiaYahya commented 1 year ago

Hi @ZakiaYahya !

Good job on those models, REDIAL has already been suggested and incorporated in the list, and the antimalarial one is also very interesting to us! Would you

Incorporate the antimalarial model in our model suggestion list?

Try it out to see if it is easy to run and list the steps here

Hello @GemmaTuron, I have incorporated the antimalarial model into the Ersilia suggestion list. Moreover, the antimalarial model doesn't list any steps in their github repo to reproduce the results and also their mentioned webservers are not accessible either but i'll try to go through their publication if it describes how to setup NPBERT-Antimalaria model and will let you know. Thanks.

GemmaTuron commented 1 year ago

Hi @ZakiaYahya

There are not many instructions indeed but it seems the model is available here: https://github.com/mldlproject/2021-NPBERT-Antimalaria/tree/main/training/NPBERT_pretrained_model/save_model

ZakiaYahya commented 1 year ago

Hi @ZakiaYahya

There are not many instructions indeed but it seems the model is available here: https://github.com/mldlproject/2021-NPBERT-Antimalaria/tree/main/training/NPBERT_pretrained_model/save_model

Hello @GemmaTuron, Yes i already did necessary installations but the actual model file is not downloaded with the repo, they have given the google drive link to download the pretrained-model file but that URL doesn't exists anymore. I'm searching it's paper, may be we get some information from where to get the model file. Right now i'm stuck at that due to unavailability of model file, here's the error log

python3 extract_feature.py --input_smile="C1CCCCC1C2CCCCC2" 2023-03-28 13:57:15.707048: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. Traceback (most recent call last): File "extract_feature.py", line 40, in <module> tokenizer=btokenizer File "/home/zakia/miniconda3/envs/NPBERT/lib/python3.7/site-packages/transformers/pipelines/__init__.py", line 783, in pipeline **model_kwargs, File "/home/zakia/miniconda3/envs/NPBERT/lib/python3.7/site-packages/transformers/pipelines/base.py", line 271, in infer_framework_load_model raise ValueError(f"Could not load model {model} with any of the following classes: {class_tuple}.") **ValueError: Could not load model ./save_model with any of the following classes**: (<class 'transformers.models.auto.modeling_tf_auto.TFAutoModel'>, <class 'transformers.models.bert.modeling_tf_bert.TFBertForMaskedLM'>).

Thanks.

GemmaTuron commented 1 year ago

hm that's a boomer if the model is no longer available (which is possible) Try to see if there is more information in the publication and otherwise let's park this and focus on writing the final application!

ZakiaYahya commented 1 year ago

hm that's a boomer if the model is no longer available (which is possible) Try to see if there is more information in the publication and otherwise let's park this and focus on writing the final application!

Hello @GemmaTuron, i have go through every possible information provided i.e. it's publication, supplementary material, links but didn't find model file anywhere. May be we can contact the authors and asked them personally for the model file or we can try to train the model from scratch but it will take time. Once we have the model file, we can get the predictions easily and incorporate it into the hub. Moreover, i'll start writing the final application today. Thanks.

GemmaTuron commented 1 year ago

Hi @ZakiaYahya

Thanks, at this moment let's not worry about the model further and just focus on finishing your final application. We'll tackle it in the future, thanks for testing it!

ZakiaYahya commented 1 year ago

Hi @ZakiaYahya

Thanks, at this moment let's not worry about the model further and just focus on finishing your final application. We'll tackle it in the future, thanks for testing it!

Okay sure @GemmaTuron, i'll start working on my final application and if i get sometime after completing my final application, i'll do some more work on NPBERT model. Thanks.

ZakiaYahya commented 1 year ago

Hello @GemmaTuron, I have submitted my final application on outreachy website. Plus, I've sent you the link of my final application for review, if you suggests me some change; i'll edit my application accordingly. Thank you for being so helpful. Moreover, i'm closing this issue. Thanks.