Open ahof1704 opened 3 months ago
Hi, i added the cv models trained with 20 features inside models directory.
Thank you for your interest.
Great. Thanks!
I also noticed you are calling load_dataset
, but it doesn't seem to be defined anywhere. I assumed you were using the dataset
package, so I installed it. Yet, the arguments are different, so I am not sure. Could you clarify where I can get your load_dataset
function.
Also, you are loading the dataset from features
, which is empty. Could you make those files available as well?
Please let me know if there is a more straightforward way to run inference with your trained model.
Thanks
Hi. Thank you for pointing out the problem. I added the import of training.py
script where load_dataset
is defined. Now to use the model for inference, as you are correctly suggesting, you need the features files. Since these files are too big for github, i didnt upload them. But you can compute them with the notebook 0_Feature_engineering.ipynb
. Please create the correct python environment to compute them with the given requirements file. Python version 3.10 should be fine.
Great. I managed to run inference with your model for the sample data. Now, to do inference for new sequences, I guess the steps are: 0) Create a fasta file with my sequences 1) Run the feature extraction following the notebook "0_Feature_engineering" 2) Load the trained model and do inference
Is that all or do I need to do anything else?
Thanks a lot for the support
Fortunately it will be simpler than that. You only need to substitute (or modify) the pandas dataframe peptide_sequences. In particular you need to insert your peptide sequences in the column Sequence of the same dataframe. The column ID is not used in the optimization loop so you can ignore it (but dont drop it).
Would it be possible to get access to the larger model (the one trained with all the 375 features)?
Thanks!!
Hi, could you make your trained model available for inference?
Thanks