ersilia-os / ersilia

The Ersilia Model Hub, a repository of AI/ML models for infectious and neglected disease research.
https://ersilia.io
GNU General Public License v3.0
203 stars 131 forks source link

🐛 Bug: Convert model input file from csv to a smiles list #351

Closed paulinebanye closed 1 year ago

paulinebanye commented 1 year ago

Describe the bug.

Hi @GemmaTuron

I encountered an error on colab, whilst converting the model prediction output - eos6o0z.csv to a smiles list.

Describe the steps to reproduce the behavior

To replicate this error

Expected behavior.

Eos6o0z.csv file converted successfully

Screenshots.

Initial steps successful colab1 err

Error encountered during conversion colab err

Operating environment

Windows 10. Brave browser version 1.44.105

Additional context

No response

Malikbadmus commented 1 year ago

Hello @pauline-banye, the data you're attempting to convert to smiles to read into python contains a wrong column name.

smiles = df["CAN_SMILES"].tolists() change the CAN_SMILES to the column that contain your SMILES string in this case can_smiles.

so what you should have is smiles = df["can_smiles"].tolist()

GemmaTuron commented 1 year ago

Thanks @Malikbadmus Indeed this notebook is coded for a specific input file, in this case the problem is that python recognizes lowercase and uppercase, so we are asking it to find the column "CAN_SMILES" (which is the short version of canonical smiles") when we actually have a column named "can_smiles" it seems? You can run: import pandas as pd #package for working with tables df = pd.read_csv(path) df.columns This will output the column names.

To select a single column (in this case we only have one): df["columnname"]

Cee-tech21 commented 1 year ago

hola @GemmaTuron, @pauline-banye, @Malikbadmus I am not sure which file/data we need to use for the conversion to smiles list. Is it the eml_canonical.csv file whose path we should pass to pd.read_csv(path)? Or is there another file we should be using for this??

paulinebanye commented 1 year ago

@Cee-tech21 it's the eml_canonical.csv we need to specify as the path