HUBioDataLab / DrugGEN

Official implementation of DrugGEN
GNU General Public License v3.0
50 stars 16 forks source link

Tensor Shape Issues #18

Open ollierodrigues opened 11 months ago

ollierodrigues commented 11 months ago

I've trained a model on the 'Ligand' submodel variation successfully. When running the model to create an output of SMILES I came across this error related to the the tensor shapes:

RuntimeError: Error(s) in loading state_dict for Generator2: size mismatch for drug_nodes.weight: copying a param with shape torch.Size([128, 9]) from checkpoint, the shape in current model is torch.Size([128, 27]). size mismatch for nodes_output_layer.weight: copying a param with shape torch.Size([9, 128]) from checkpoint, the shape in current model is torch.Size([27, 128]). size mismatch for nodes_output_layer.bias: copying a param with shape torch.Size([9]) from checkpoint, the shape in current model is torch.Size([27]).

I can't find anywhere in the code to change the number of nodes to generate molecules

atabeyunlu commented 11 months ago

Thanks for your interest! I appreciate you reporting this issue. To help me investigate further, could you please provide more details about your data and the specific commands you used?

ollierodrigues commented 11 months ago

Hi,

Thanks for getting back to me. Our dataset consists of around 40, 000 ligands. These are ligands from the Cambridge Structural Database, above the molecular weight of 75, with those that are not compatible with your code being removed. This line of code was used to train the model: python DrugGEN/main.py --submodel="Ligand" --mode="train" --raw_file=" DrugGEN/data/chembl_train.smi" --dataset_file="chembl45_train.pt" -- drug_raw_file="DrugGEN/data/akt_train.smi" --drug_dataset_file=" drugs_train.pt" --max_atom=45

Our dataset was used instead of the 'akt_train.smi' dataset, so the CHEMBL dataset was still used. This successfully trained a model, shown here: [image: image.png]

Following this, we ran this line of code: python DrugGEN/main.py --submodel="Ligand" --mode="inference" -- inference_model="DrugGEN/experiments/models/Ligand_Model".

Ligand_Model is the name of the folder containing the above files. This produced the following error: [image: image.png]I am unsure where in the code the tensor shape of the model is written. If you have any suggestions that would be very helpful. Many thanks, Ollie

On Thu, Dec 7, 2023 at 1:34 PM Atabey Ünlü @.***> wrote:

Thanks for your interest! I appreciate you reporting this issue. To help me investigate further, could you please provide more details about your data and the specific commands you used?

— Reply to this email directly, view it on GitHub https://github.com/HUBioDataLab/DrugGEN/issues/18#issuecomment-1845352830, or unsubscribe https://github.com/notifications/unsubscribe-auth/BEFCJ36D4K3GD7P7QZWOWSDYIHAVHAVCNFSM6AAAAABADBNA7SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBVGM2TEOBTGA . You are receiving this because you authored the thread.Message ID: @.***>

ollierodrigues commented 11 months ago

Hi,

Just as a follow up to the previous message. I have attempted to run a model using only your datasets, not swapping them out for mine and when running the line of code: python DrugGEN/main.py --submodel="Ligand" --mode="inference" --inference_model="DrugGEN/experiments/models /Ligand_Model".

This error appeared: [image: image.png]

This folder did appear in the experiments/inference directory, as expected, but it was empty. A file with de novo molecules was expected to be outputted into this folder: [image: image.png]

Many thanks, Ollie

On Fri, Dec 8, 2023 at 2:41 PM Ollie Rodrigues @.***> wrote:

Hi,

Thanks for getting back to me. Our dataset consists of around 40, 000 ligands. These are ligands from the Cambridge Structural Database, above the molecular weight of 75, with those that are not compatible with your code being removed. This line of code was used to train the model: python DrugGEN/main.py --submodel="Ligand" --mode="train" --raw_file=" DrugGEN/data/chembl_train.smi" --dataset_file="chembl45_train.pt" -- drug_raw_file="DrugGEN/data/akt_train.smi" --drug_dataset_file=" drugs_train.pt" --max_atom=45

Our dataset was used instead of the 'akt_train.smi' dataset, so the CHEMBL dataset was still used. This successfully trained a model, shown here: [image: image.png]

Following this, we ran this line of code: python DrugGEN/main.py --submodel="Ligand" --mode="inference" -- inference_model="DrugGEN/experiments/models/Ligand_Model".

Ligand_Model is the name of the folder containing the above files. This produced the following error: [image: image.png]I am unsure where in the code the tensor shape of the model is written. If you have any suggestions that would be very helpful. Many thanks, Ollie

On Thu, Dec 7, 2023 at 1:34 PM Atabey Ünlü @.***> wrote:

Thanks for your interest! I appreciate you reporting this issue. To help me investigate further, could you please provide more details about your data and the specific commands you used?

— Reply to this email directly, view it on GitHub https://github.com/HUBioDataLab/DrugGEN/issues/18#issuecomment-1845352830, or unsubscribe https://github.com/notifications/unsubscribe-auth/BEFCJ36D4K3GD7P7QZWOWSDYIHAVHAVCNFSM6AAAAABADBNA7SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBVGM2TEOBTGA . You are receiving this because you authored the thread.Message ID: @.***>

atabeyunlu commented 11 months ago

Hey Ollie,

I've noticed that I'm unable to view the images you shared. I'll investigate this matter further, taking into consideration the details you provided.

ollierodrigues commented 11 months ago

Hi,

Sorry about that. I'll attach the relevant images to this email and see if that helps. One is the error that came from when using our dataset, the other is the error that occurred when running your dataset.

Many thanks, Ollie

On Mon, Dec 11, 2023 at 12:07 PM Atabey Ünlü @.***> wrote:

Hey Ollie,

I've noticed that I'm unable to view the images you shared. I'll investigate this matter further, taking into consideration the details you provided.

— Reply to this email directly, view it on GitHub https://github.com/HUBioDataLab/DrugGEN/issues/18#issuecomment-1849948314, or unsubscribe https://github.com/notifications/unsubscribe-auth/BEFCJ36USMI3KFTZAFXUF2LYI3ZOVAVCNFSM6AAAAABADBNA7SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBZHE2DQMZRGQ . You are receiving this because you authored the thread.Message ID: @.***>

ollierodrigues commented 11 months ago

Our_ligand_dataset_error (1).txt Your_DrugGEN_dataset_error (1).txt

atabeyunlu commented 10 months ago

Hey Ollie, I only see the text files now. Maybe you should upload the images through GitHub rather than sending them via email.

ollierodrigues commented 10 months ago

Hi,

Thanks for getting back to me. This image here is the image associated with our dataset errors. Our_Ligand_Dataset_Error

ollierodrigues commented 10 months ago

This image is the error associated with your dataset. Your_DrugGEN_Dataset_Error

atabeyunlu commented 10 months ago

Thanks, I will be getting back to you as soon as possible.