Reproducible error using peptide_coefficient_predictor.py

Arthfael commented 2 years ago

Hi, I am trying to run the scripts on my own data, but since I am encountering an error with it, I also just tried on each of the 12 example pre-processed tsv datasets provided in the download. All give me the same error, so this cannot be coming from my data: One of the dimensions in the output is <= 0 due to downsampling in conv2d_17. Consider increasing the input size. Received input shape [None, 40, 20, 1] which would produce output shape with a zero or negative value in a dimension. I suspect that this has to do with my configuration, in particular I know that python version can be critical. I am using python 3.10.5. Which version was used in the paper? I tried a few other 3.5 variants but on some I could not install all requirements.

Note: This was done from R using package reticulate to create a python 3.10.5 virtual environment, install all requirements listed in the two scripts I am using, namely numpy, pandas, argparse, matplotlib, seaborn, tensorflow, keras, sklearn and scipy. Because reticulate does not let me run scripts with arguments, a copy of the relevant scripts was made then edited so that the inputs, outputs and if necessary relevant parameters would reflect the correct paths/values already within the script. The datasets were successfully one-hot-encoded this way, then I called on each results file the edited peptide_coefficient_predictor.py script, applying default parameters except n_runs and seq_length.

justin-a-sanders commented 2 years ago

I think your python versions are fine given you were able to install all the dependencies. Looking at the R script you sent, it looks like you are not passing in the --filter_size argument to peptide_coefficient_predictor.py. The default value for filter_size when one isn't provided is 103 (a mistake we should correct) while your sequence length is only 40, so the two are incompatible. Try passing in a filter size of 3 instead and seeing if that resolves the issue.

In general, you can refer to train_models.sh for some example calls to peptide_coefficient_predictor.py using our recommended parameters.

Arthfael commented 2 years ago

Dear Justin,

Thank you very much for your help! This seems to have done the trick. On closer inspection I can see that the edits I had had to make to run peptide_coefficient_predictor.py through reticulate (because afaik reticulate can only run standalone python scripts, not python scripts with arguments) were incomplete. In case this would be of interest to anyone, I have attached my fixed R script for testing Pepper, and an example of the edited python script it creates. All arguments are now removed and instead the corresponding object is explicitly assigned within the script (so I also removed all subsequent references to args.). After making these edits (including using 3 instead of 103 for filter size), the python script appears to work through reticulate on your examples datasets, and I will now test on my own data...

About the train_models.sh script: I had looked at it, but did not know what to make of it because I could not find out what the different arguments meant, so left them to their default values. Is there some kind of guide somewhere explaining the meaning of filter_size, n_filters, n_layers, n_nodes, dropout, learning_rate, batch, epochs, early_stopping and random_run, their relationship and the implications of changing any of them? There is no "help" field for these arguments in the script, and neither the readme nor the github appear to describe what they do.

Kind regards,

Armel

On Tue, Oct 18, 2022 at 7:26 PM justin-a-sanders @.***> wrote:

I think your python versions are fine given you were able to install all the dependencies. Looking at the R script you sent, it looks like you are not passing in the --filter_size argument to peptide_coefficient_predictor.py. The default value for filter_size when one isn't provided is 103 (a mistake we should correct) while your sequence length is only 40, so the two are incompatible. Try passing in a filter size of 3 instead and seeing if that resolves the issue.

In general, you can refer to train_models.sh for some example calls to peptide_coefficient_predictor.py using our recommended parameters.

— Reply to this email directly, view it on GitHub https://github.com/Noble-Lab/Pepper/issues/2#issuecomment-1282757710, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABSHV4PNKXJYXEHZEBCM4LDWD3MTVANCNFSM54ZWR3ZQ . You are receiving this because you authored the thread.Message ID: @.***>

wsnoble commented 2 years ago

You are right: the parameters are defined in this script, but many had no "help" field defined. I have added them now. However, a detailed discussion of how to tune the hyperparameters of a deep neural network is outside the scope of Pepper documentation.

Arthfael commented 2 years ago

Fair enough, I guess I have some learning to do.

On Wed, Oct 19, 2022 at 5:50 PM William Stafford Noble < @.***> wrote:

You are right: the parameters are defined in this script https://github.com/Noble-Lab/Pepper/blob/main/peptide_coefficient_predictor.py, but many had no "help" field defined. I have added them now. However, a detailed discussion of how to tune the hyperparameters of a deep neural network is outside the scope of Pepper documentation.

— Reply to this email directly, view it on GitHub https://github.com/Noble-Lab/Pepper/issues/2#issuecomment-1284228434, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABSHV4JHJU76US3OZ73NQ73WEAKCRANCNFSM54ZWR3ZQ . You are receiving this because you authored the thread.Message ID: @.***>

Noble-Lab / Pepper

Reproducible error using peptide_coefficient_predictor.py #2