Arcadia-Science / peptigate

Peptigate ("peptide" + "investigate") predicts bioactive peptides from transcriptome assemblies or sets of proteins.
MIT License
1 stars 1 forks source link

update snakefile to train model for plmutils instead of rnasamba #20

Closed taylorreiter closed 7 months ago

taylorreiter commented 8 months ago

PR checklist

This PR builds a plmutils model using short sequences from ensembl model organisms. It removes the model build for RNAsamba bc we outperform that tool (which this PR also tests...we get 28% accuracy πŸŽ‰ while that isn't good, it's better than rnasamba on this data set).

Validation metrics
Auc_roc: 0.82
Accuracy: 0.75
Precision: 0.78
Recall: 0.67
Mcc: 0.50
Num_true_positive: 7343.00
Num_false_positive: 2130.00
Num_true_negative: 10187.00
Num_false_negative: 3690.00
Num_positive: 11033.00
Num_negative: 12317.00
Model saved to 'outputs/models/plmutils/2_model/'

The point of this snakefile is only to document the model build process. I don't expect many people will want to rerun the script (unlike peptigate, which we expect to be run many times).

it also breaks up some comment lines to respect the 100 character limit πŸ™ƒ