Closed Pradnya2203 closed 1 year ago
Motivation Statement:
I first heard about Outreachy from a friend and was truly pleased by this idea of supporting diversity and encouraging the under-represented groups from all around the world. I am a sophomore at IIT Roorkee and am also a part of various student technical clubs related to software development and data science.
I was quite excited to know that my application was approved and while going through the projects I came across Ersilia which seemed very appealing for multiple reasons. Firstly the cause; providing medical resources to under-developed countries. I have always wanted to be help people using my skills and would be overwhelmed to contribute for such a cause. Secondly the tech-stack used suits me and would help me in my future goals to pursue a career in data science.
I have worked with various languages like python, Javascript, C++, PHP, MATLab and would like to get a strong hold on python during this internship period.
Ersilia will be a great opportunity to improve my skills as well as work for the betterment of society. I am really looking forward to contribute in this project and also learn a lot in the process.
Hi @Pradnya2203
Thanks for your interest and welcome to Ersilia! Please, if you have successfully installed Ersilia and run a test model, report it here and also let us know which system are you using. Thanks!
Hey I am using ubuntu 22.04 and did run the sample model. We are supposed to fork the repository and then start contributing right?
Hi @Pradnya2203 !
Please read the guidelines for the contribution period. This time around in order to be able to better provide support to all applicants we have set up a set of defined tasks to be completed each week. https://ersilia.gitbook.io/ersilia-book/contributors/internships/outreachy-summer-2023
In addition, we will be handing out specific tasks to interns as soon as we know everyone is set up
Hi @Pradnya2203
As you will see in issue #343 this model seems to present some issues at fetch time. Please can you test it both using the CLI and the Google Colab template (use the template provided in /notebooks), report if it is working in either of the systems and the log files. When fetching the model, please collect the log files and try to identify the source of the error, if there are any.
Thanks!
I don't exactly know why am I getting this error "ModuleNotFoundError: No module named 'yaml' " I tried installing pyyaml but didn't change anything, I'll try to solve it though Tested using Google Colab template as well but the model still doesn't work
Hi @Pradnya2203. From your error log ('Connection aborted.', OSError(0, 'Error'))
. This looks like your connection was abandoned by the Host. Probably due to a system Error from your end.
I also tried Fetching the Model on Ubuntu 22.04, but i had to terminate the process because it was taking too long.
Hey @AhmedYusuff, I was not actually facing that error, was able to get around with that one but I uploaded the old log file by mistake. I have now uploaded the now log file. Thanks a lot :)
You are welcome @Pradnya2203.
In your Log file I can see your model failed when it tried to import yaml ModuleNotFoundError: No module named 'yaml'
You can use pip show pyyaml
to see if you have yaml installed on your system.
Yes I have tried that as well @AhmedYusuff
Hi @Pradnya2203
Important: did you activate the conda environment of the model to install yaml? you should first:
conda activate eos3ae
and then
pip show pyyaml
Hey @GemmaTuron I installed the module after activating the conda environment of the model, and checked it using pip show pyyaml, but I'm still getting the same error when I run the model and when I check again I see no pyyaml in the conda environment of the model. I'll try to fix it.
Hey @GemmaTuron I installed the module after activating the conda environment of the model, and checked it using pip show pyyaml, but I'm still getting the same error when I run the model and when I check again I see no pyyaml in the conda environment of the model. I'll try to fix it.
Hi @Pradnya2203 ! thanks, I'd suggest first focusing on week 2 tasks and if those are completed on time, then we'll tackle the extra tasks assigned to you :)
The model I chose for week 2 was Smiles To IUPAC Translator. This model was particularly interesting to me as it converts a simplified representation of a molecule (SMILES) into a standardized format for naming chemical compounds (IUPAC). This type of translator would be extremely useful in the field of drug discovery, where understanding the chemical structure of molecules is crucial for developing new drugs. By being able to accurately translate SMILES into IUPAC, researchers can obtain important information about a molecule's properties. This information is essential for identifying potential drug targets, predicting how a molecule will interact with other compounds in the body, and designing new drug molecules that can better target specific diseases.
I was able to fetch and serve it from the Ersilia Model Hub and get the following output
"input": {
"key": "POLCUAVZOMRGSN-UHFFFAOYSA-N",
"input": "CCCOCCC",
"text": "CCCOCCC"
},
"output": {
"outcome": [
"1-propoxypropane"
]
}
}
I than tried to actually install and run the original open source model which is https://github.com/Kohulan/Smiles-TO-iUpac-Translator#simple-usage To run the model I created a new file app.py which had the following code
from STOUT import translate_forward, translate_reverse
# SMILES to IUPAC name translation
SMILES = "CN1C=NC2=C1C(=O)N(C(=O)N2C)C"
IUPAC_name = translate_forward(SMILES)
print("IUPAC name of "+SMILES+" is: "+IUPAC_name)
# IUPAC name to SMILES translation
IUPAC_name = "1,3,7-trimethylpurine-2,6-dione"
SMILES = translate_reverse(IUPAC_name)
print("SMILES of "+IUPAC_name+" is: "+SMILES)
I edited this file to take input as "1-propoxypropane" and got the following result
SMILES of 1-propoxypropane is: CCCOCCC.CCCOCCC
I ran into certain issues, initially I couldn't figure out how to actually run it and when I did I got an error that "[Errno 0] JVM DLL not found"
Solved this error using sudo apt install default-jre
After running the model I used the given dataset to get the output. To use the dataset I first filtered out the IUPAC names of the molecules and created an array of strings and used a for loop to iterate and run the model on all the IUPAC names. I got the following output translate_reverse.txt
STOUT model has two functionalities. They are: translate_forward and translate_reverse. translate_forward converts the SMILES to IUPAC and conversely translate_reverse converts IUPAC to SMILES. In the above comment it can be seen that translate reverse has been used. Now we will use translate_forward using can_smiles from the given dataset and get the following output. translate_forward.txt
Hi @Pradnya2203
Great, thanks for this work! Can I ask you as extra task to install the NCATS models (use the development branch of the repo) and test out the Human Cytosolic Stability model? @pauline-banye did a lot of work in the previous internship to implement the different NCATS models and I want to make sure those are all working :)
Many thanks!
The last step was to run the model on Ersilia Model Hub on the dataset. For that I fetched and served the model: "STOUT: SMILES to IUPAC name translator". Now for the model to iterate over the entire dataset which is https://raw.githubusercontent.com/ersilia-os/ersilia/master/notebooks/eml_canonical.csv, I first processed the data and chose the can_smiles column as my input. For that I made a bash script which ran on my CLI and gave the following output. output file: ersilia_output.txt The bash script was :
#!/bin/bash
s = ()
ersilia serve smiles2iupac
for n in ${s[@]};
do
ersilia api -i $n
done
Here s contained the whole array of strings which was can_smiles
The two outputs of the Smiles To IUPAC Translator by using original source code and ersilia model hub gives following results posted above. On comparing the two we can see the following results:
For example for the input Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1
we get the output as IUPAC name of Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1 is: [(1S,4R)-4-[2-amino-6-(cyclopropylamino)purin-9-yl]cyclopent-2-en-1-yl]methanol
for original source code and
"input": {
"key": "MCGSCOLBFJQGHM-SCZZXKLOSA-N",
"input": "Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1",
"text": "Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1"
},
"output": {
"outcome": [
"[(1R,4R)-4-[2-amino-4-(cyclopropylamino)-4H-purin-9-yl]cyclopent-2-en-1-yl]methanol"
]
}
}
for the ersilia model hub code
We can see that the output of the two matches. Similarly we can check for other inputs as well using the files posted above
Problems I ran into while running the model on both original source code and using ersilia model hub:
Hey @GemmaTuron, I have completed the week 2 tasks using the model Smiles To IUPAC Translator. I have documented all issues I faced during completion of the tasks and have posted the results of it as well. Apart from this model I also tried to run the NCATS model but was unable to setup the conda environment for it as it took a large amount of time to setup and got an error related to pip and HTTP connection. Got the same error even after retrying and making sure that the network connection is strong enough. I will try to set it up and again and continue the task as per your instructions. Also do I need to make any changes to my task 2 submission? Thank you
Hi @Pradnya2203
The tasks are fine, you can reach out to Masroor or Zakia who have also been working on the NCATS model. What I can suggest if you are having issues is to follow the environment.yml file manually, instead of running conda env create --prefix ./env -f environment.yml
open the .yml file and install manually one by one the dependencies. This will tell you which ones are giving issues (go in order, and create a conda env with the right python version)
Update: I was able to create the conda environment. The mistake I did before was not setting up chemprop. But app.py is giving errors.
Loading RLM graph convolutional neural network model
Traceback (most recent call last):
File "app.py", line 20, in <module>
from predictors.rlm.rlm_predictor import RLMPredictior
File "/home/pradnya/ncats-adme/server/predictors/rlm/__init__.py", line 177, in <module>
rlm_gcnn_scaler, rlm_gcnn_model, rlm_gcnn_model_version = load_gcnn_model()
File "/home/pradnya/ncats-adme/server/predictors/rlm/__init__.py", line 148, in load_gcnn_model
rlm_gcnn_scaler, _ = load_scalers(rlm_gcnn_scaler_path)
File "/home/pradnya/ncats-adme/server/./predictors/chemprop/chemprop/utils.py", line 132, in load_scalers
state = torch.load(path, map_location=lambda storage, loc: storage)
File "/home/pradnya/ncats-adme/server/env/lib/python3.8/site-packages/torch/serialization.py", line 585, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/pradnya/ncats-adme/server/env/lib/python3.8/site-packages/torch/serialization.py", line 755, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '<'.
After searching a bit about the error I realized that it's and error with the model so I tried solving it by making sure that chemprop is running well, it took sometime as the packages were not compatible with each other and there were some errors in installing certain modules. But I was able to fix them all and made sure that chemprop is running. But am still facing the same error with app.py. I will try to fix it soon.
I think the issue is with accessing the models from ncat servers. On clicking any of the models I am redirected to this page and on visiting the site mentioned I find this
Hi @Pradnya2203 !
for the local implementation, you need to make sure you download the right model and place it in the folder manually, since the models cannot be accessed from the server (they stopped maintenance apparently). Use the links provided in the development branch
Update: I was able to create the conda environment. The mistake I did before was not setting up chemprop. But app.py is giving errors.
Loading RLM graph convolutional neural network model Traceback (most recent call last): File "app.py", line 20, in <module> from predictors.rlm.rlm_predictor import RLMPredictior File "/home/pradnya/ncats-adme/server/predictors/rlm/__init__.py", line 177, in <module> rlm_gcnn_scaler, rlm_gcnn_model, rlm_gcnn_model_version = load_gcnn_model() File "/home/pradnya/ncats-adme/server/predictors/rlm/__init__.py", line 148, in load_gcnn_model rlm_gcnn_scaler, _ = load_scalers(rlm_gcnn_scaler_path) File "/home/pradnya/ncats-adme/server/./predictors/chemprop/chemprop/utils.py", line 132, in load_scalers state = torch.load(path, map_location=lambda storage, loc: storage) File "/home/pradnya/ncats-adme/server/env/lib/python3.8/site-packages/torch/serialization.py", line 585, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File "/home/pradnya/ncats-adme/server/env/lib/python3.8/site-packages/torch/serialization.py", line 755, in _legacy_load magic_number = pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, '<'.
After searching a bit about the error I realized that it's and error with the model so I tried solving it by making sure that chemprop is running well, it took sometime as the packages were not compatible with each other and there were some errors in installing certain modules. But I was able to fix them all and made sure that chemprop is running. But am still facing the same error with app.py. I will try to fix it soon.
Hello @Pradnya2203 I found a fix for this. Download the model file manually from here:
and place them in their respective directory which is inside the models directory like this: ..\ncats-adme\server\models\rlm ..\ncats-adme\server\models\pampa
then run:
python app.py
Update: I manually downloaded the model file and placed it in the right folders and also installed the right version of every single package needed and I'm still getting the same error.
Update: I was finally able to run the ncats-adme model after a lot of struggle. I was repeatedly getting the same error which is
Traceback (most recent call last):
File "app.py", line 20, in <module>
from predictors.rlm.rlm_predictor import RLMPredictior
File "/home/pradnya/ncats-adme/server/predictors/rlm/__init__.py", line 177, in <module>
rlm_gcnn_scaler, rlm_gcnn_model, rlm_gcnn_model_version = load_gcnn_model()
File "/home/pradnya/ncats-adme/server/predictors/rlm/__init__.py", line 148, in load_gcnn_model
rlm_gcnn_scaler, _ = load_scalers(rlm_gcnn_scaler_path)
File "/home/pradnya/ncats-adme/server/./predictors/chemprop/chemprop/utils.py", line 132, in load_scalers
state = torch.load(path, map_location=lambda storage, loc: storage)
File "/home/pradnya/ncats-adme/server/env/lib/python3.8/site-packages/torch/serialization.py", line 585, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/pradnya/ncats-adme/server/env/lib/python3.8/site-packages/torch/serialization.py", line 755, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '<'.
I tried everything, right from manually installing every package to going to the depths of the code to actually find the source of the error. Finally I realized that it was a really simple solution. There was somehow an auto-downloaded corrupt file which was the root cause of the error and just deleting it solved it. Now this might seem like a trivial issue, but I think it causes huge inconvenience as the file is auto-downloaded and the error barely tells anything about it and we keep getting UnpicklingError .
After removing the corrupt file, I was able to run python app.py
but then realised that my ubuntu does not have sufficient space and had to borrow some from windows and somehow a simple restart led to loss of data on ubuntu (no idea how), so had to set up ncats again and lastly app.py took a long long time to run but it's finally working well and I get the following output after putting
input.csv
as input.
So this is the result I get for Human Cytosolic Stability model. Also thanks a lot @emmakodes and @GemmaTuron for helping me out with this error
The explanation of the output with smiles as input is: mol: Gives the structure of the molecule Tanimoto Similarity: It is the most popular similarity measure for comparing chemical structures represented by means of fingerprints is the Tanimoto (or Jaccard) coefficient T. Two structures are usually considered similar if T > 0.85 (for Daylight fingerprints).The Tanimoto algorithm states that A and B are sets of fingerprint “bits” within the fingerprints of molecule A and molecule B. AB is defined as the set of common bits of fingerprints of both molecule A and B. The resulting Tanimoto coefficient (or T(A,B)) ranges from 0, when the fingerprints have no bits in common, to 1, when the fingerprints are identical. Thus,
T(A,B) = (A ∩ B)/(A + B - A ∩ B)
The chemical similarity problem then becomes, Given molecule A, find all formulas that have a Tanimoto coefficient greater than a given threshold. The greater the value of a set threshold, the more similar the molecules are.
More information on Human Liver Cytosolic Stability: Over the last few decades, chemists have become skilled at designing compounds that avoid cytochrome P (CYP) 450 mediated metabolism. Typical screening assays are performed in liver microsomal fractions and it is possible to over‑ look the contribution of cytosolic enzymes until much later in the drug discovery process. Few data exist on cytosolic enzyme‑mediated metabolism and no reliable tools are available to chemists to help design away from such liabili‑ ties. ML models have helped to develop in silico classifiers based on the human cytosol stability data to facilitate identification of potential substrates during the lead optimization phase.
Update: I was finally able to run the ncats-adme model after a lot of struggle. I was repeatedly getting the same error which is
Traceback (most recent call last): File "app.py", line 20, in <module> from predictors.rlm.rlm_predictor import RLMPredictior File "/home/pradnya/ncats-adme/server/predictors/rlm/__init__.py", line 177, in <module> rlm_gcnn_scaler, rlm_gcnn_model, rlm_gcnn_model_version = load_gcnn_model() File "/home/pradnya/ncats-adme/server/predictors/rlm/__init__.py", line 148, in load_gcnn_model rlm_gcnn_scaler, _ = load_scalers(rlm_gcnn_scaler_path) File "/home/pradnya/ncats-adme/server/./predictors/chemprop/chemprop/utils.py", line 132, in load_scalers state = torch.load(path, map_location=lambda storage, loc: storage) File "/home/pradnya/ncats-adme/server/env/lib/python3.8/site-packages/torch/serialization.py", line 585, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File "/home/pradnya/ncats-adme/server/env/lib/python3.8/site-packages/torch/serialization.py", line 755, in _legacy_load magic_number = pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, '<'.
I tried everything, right from manually installing every package to going to the depths of the code to actually find the source of the error. Finally I realized that it was a really simple solution. There was somehow an auto-downloaded corrupt file which was the root cause of the error and just deleting it solved it. Now this might seem like a trivial issue, but I think it causes huge inconvenience as the file is auto-downloaded and the error barely tells anything about it and we keep getting UnpicklingError .
Good job on debugging @Pradnya2203. Can you provide more context regarding the error. This would assist others that encounter a similar error.
Hi @Pradnya2203
Thanks, it seems the model downloaded when cloning the repo is corrupt? or is the model when you Download it manually that is corrupt? Automatic download when running the model seems to work fine, which is great! We faced a similar issue with the prediction of human liver metabolism with @pauline-banye , could you check if you are able to run that model as well or have the same unpickling error? If you have time to combine with week 3 tasks, ofc
Thanks :)
Hey @GemmaTuron ,
The manually downloaded one is not corrupt, it's the one getting downloaded while setting up the repository. I am unable to run the human liver metabolism model on the browser after running app.py. The only error that it is giving is There was an error processing your file. Please make sure you have selected a file that contains SMILES, indicate if the file contains a header and the column number containing the SMILES.
, but the same input file is running on every other model I checked. For example it is giving the following output when run using PAMPA Permeability (pH 7.4) model.
ADME_Predictions_2023-03-15-070819.csv
Hi @Pradnya2203 !
Thanks, on the PAMPA model, I see it says PAMPA50, so might it be you are running PAMPA 5.0 not 7.4? For the Human Liver Metabolism model, that is surprising since it is using the exact same data loader function (from the Gcnn Base class.. can you paste here the input file?
Hey @pauline-banye @GemmaTuron Going into detail the error faced + a solution
On running python app.py
the checkpoints for each model gets downloaded into it’s specific directory (say models/rlm)
If there is an interruption (for example due to network issues, as was my case) the downloaded checkpoint file is corrupted.
This causes the UnpicklingError and the python app.py command exits, the user is stuck unless they navigate to the corrupted file under Models, remove it and rerun the command.
I believe this is an issue that should be opened on ncats-adme as we can implement some sort of error handling (while loading checkpoints) to validate the said checkpoint file. If it’s corrupted, the model checkpoint should be re-downloaded OR an appropriate “loss of network connection error be posted
Hi @Pradnya2203 !
Thanks, on the PAMPA model, I see it says PAMPA50, so might it be you are running PAMPA 5.0 not 7.4? For the Human Liver Metabolism model, that is surprising since it is using the exact same data loader function (from the Gcnn Base class.. can you paste here the input file?
I think it was pampa 7.4 only. This is the input file: input.csv
Hi @Pradnya2203
Just to eb sure, can you once more download the PAMPA 7.4 and test it? see if we are getting a PAMPA74 on the model coulmn or still PAMPA50. I'll collect all these issues and write to the authors to clarify these points, thanks! For the Human Cytosolic Model, you were able to run from the server app but not running the app.py file, from what I understand in above comments right?
To close off this part:
Hey @GemmaTuron This is the input file is : input.csv This is the prediction using Human Cytosolic Model : ADME_Predictions_2023-03-22-224133.csv This is the prediction using PAMPA74: ADME_Predictions_2023-03-22-224316.csv This is the prediction using PAMPA50: ADME_Predictions_2023-03-22-224417.csv This is the output of Ersilia Model Hub implementation of NCATS Human Cytosolic Model: ncats-hlcs.txt
I was able to run app.py file but was unable to run from server app for Human Liver Cytosol Stability Model and am receiving the following error There was an error processing your file. Please make sure you have selected a file that contains SMILES, indicate if the file contains a header and the column number containing the SMILES.
on using the same input file as above.
ADMET_XGBoost
The absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties are important in drug discovery as they define efficacy and safety. In this work, we applied an ensemble of features, including fingerprints and descriptors, and a tree-based machine learning model, extreme gradient boosting, for accurate ADMET prediction. The model performs well in the Therapeutics Data Commons ADMET benchmark group. For 22 tasks, the model is ranked first in 18 tasks and top 3 in 21 tasks.
Accurate ADMET prediction
python=3.7 rdkit deepchem scikit-learn PyTDC xgboost mordred gensim tensorflow~=2.4 PubChemPy
https://paperswithcode.com/paper/accurate-admet-prediction-with-xgboost
https://arxiv.org/pdf/2204.07532v3.pdf
https://github.com/smu-tao-group/ADMET_XGBoost
GNU General Public License v3.0
Identifying novel drug-target interactions (DTI) is a critical and rate limiting step in drug discovery. AI-Bind is a pipeline that combines network-based sampling strategies with unsupervised pre-training, allowing us to limit the annotation imbalance and improve binding predictions for novel proteins and ligands. AI-Bind predicted drugs and natural compounds with binding affinity to SARS-CoV-2 viral proteins and the associated human proteins. These predictions are also validated via docking simulations and comparison with recent experimental evidence, and step up the process of interpreting machine learning prediction of protein-ligand binding by identifying potential active binding sites on the amino acid sequence. Overall, AI-Bind offers a powerful high-throughput approach to identify drug-target combinations, with the potential of becoming a powerful tool in drug discovery.
https://paperswithcode.com/paper/ai-bind-improving-binding-predictions-for
https://arxiv.org/pdf/2112.13168v5.pdf
https://github.com/chatterjeeayan/ai-bind
https://zenodo.org/record/7226641
MIT License
OpenChem
OpenChem is a deep learning toolkit for Computational Chemistry with PyTorch backend. The goal of OpenChem is to make Deep Learning models an easy-to-use tool for Computational Chemistry and Drug Design Researchers.
Modular design with unified API, modules can be easily combined with each other. OpenChem is easy-to-use: new models are built with only configuration file. Fast training with multi-gpu support. Utilities for data preprocessing. Tensorboard support.
numpy pyyaml scipy ipython mkl scikit-learn six pytest pytest-cov
Classification (binary or multi-class) Regression Multi-task (such as N binary classification tasks) Generative models
https://pubs.acs.org/doi/full/10.1021/acs.jcim.0c00971
https://pubs.acs.org/doi/pdf/10.1021/acs.jcim.0c00971
https://github.com/Mariewelt/OpenChem
MIT License
Hi @Pradnya2203 !
Similar to OpenMM that @samuelmaina has pointed to, OpenChem is a framework to develop models, but not a model in itself, so we could not directly incorporate it in the Hub, we should use it to train models and then incorporate those in the Hub - I don't like the fact that Nvidia GPU's are required to run OpenChem, since most computers do not have them. But thanks for the suggestion, looking forward to your next ones!
Hi @Pradnya2203 !
Sorry, I missed the above: ADMET_XGBoost : good catch, looks interesting but I fail to see the model checkpoints or the data to retrain the models, is any of this available? AI-Bind: I did not know about this tool, they are intensively developing this it seems like (5 updates in Arxv thus far!). Looks like a promising approach, at this moment we cannot incorporate it in the Hub because we cannot pass protein as input and I see in the requirements it will need GPU to run (we try to avoid serving models that require NVIDIA GPU's, because most people won't have access to them) - but I'll keep an eye on the tool and see if we can use it!
@Pradnya2203 ,
As next steps,
Hey @GemmaTuron,
I tried to run REDIAL 2000, it was fairly easy to run and I used their own sample dataset sample_data.csv
and got the following results: 3CL-sample_data-consensus.csv ACE2-sample_data-consensus.csv AlphaLISA-sample_data-consensus.csv CoV1-PPE_cs-sample_data-consensus.csv CoV1-PPE-sample_data-consensus.csv CPE-sample_data-consensus.csv cytotox-sample_data-consensus.csv hCYTOX-sample_data-consensus.csv MERS-PPE_cs-sample_data-consensus.csv MERS-PPE-sample_data-consensus.csv TruHit-sample_data-consensus.csv
REDIAL-2020 is an open-source, open-access machine learning suite for estimating anti-SARS-CoV-2 activities from molecular structure. By leveraging data available from NCATS, eleven categorical machine learning models are developed: CPE, cytotox, AlphaLISA, TruHit, ACE2, 3CL, CoV-PPE, CoV-PPE_cs, MERS-PPE, MERS-PPE_cs and hCYTOX. These models are exposed on the REDIAL-2020 portal, and the output of a similarity search using input data as a query is provided for every submitted molecule. The top-ten most similar molecules to the query molecule from the existing COVID-19 databases, together with associated experimental data, are displayed. This allows users to evaluate the confidence of the machine learning predictions.
With the ADMET-XGBoost, I tried running it as well, it does have the dataset available but I fail to see any checkpoints. I tried finding it on their documentation as well but was unable to.
Week 1 - Get to know the community
Week 2 - Install and run an ML model
Week 3 - Propose new models
Week 4 - Prepare your final application