gabribg88 / LightCPPgen

1 stars 0 forks source link

Access to the trained model #1

Open ahof1704 opened 3 months ago

ahof1704 commented 3 months ago

Hi, could you make your trained model available for inference?

Thanks

gabribg88 commented 3 months ago

Hi, i added the cv models trained with 20 features inside models directory.

Thank you for your interest.

ahof1704 commented 3 months ago

Great. Thanks!

I also noticed you are calling load_dataset, but it doesn't seem to be defined anywhere. I assumed you were using the dataset package, so I installed it. Yet, the arguments are different, so I am not sure. Could you clarify where I can get your load_dataset function.

Also, you are loading the dataset from features, which is empty. Could you make those files available as well?

Please let me know if there is a more straightforward way to run inference with your trained model.

Thanks

gabribg88 commented 3 months ago

Hi. Thank you for pointing out the problem. I added the import of training.py script where load_dataset is defined. Now to use the model for inference, as you are correctly suggesting, you need the features files. Since these files are too big for github, i didnt upload them. But you can compute them with the notebook 0_Feature_engineering.ipynb. Please create the correct python environment to compute them with the given requirements file. Python version 3.10 should be fine.

ahof1704 commented 3 months ago

Great. I managed to run inference with your model for the sample data. Now, to do inference for new sequences, I guess the steps are: 0) Create a fasta file with my sequences 1) Run the feature extraction following the notebook "0_Feature_engineering" 2) Load the trained model and do inference

Is that all or do I need to do anything else?

Thanks a lot for the support

gabribg88 commented 3 months ago

Fortunately it will be simpler than that. You only need to substitute (or modify) the pandas dataframe peptide_sequences. In particular you need to insert your peptide sequences in the column Sequence of the same dataframe. The column ID is not used in the optimization loop so you can ignore it (but dont drop it).

ahof1704 commented 3 months ago

Would it be possible to get access to the larger model (the one trained with all the 375 features)?

Thanks!!