Noble-Lab / casanovo

De Novo Mass Spectrometry Peptide Sequencing with a Transformer Model
https://casanovo.readthedocs.io
Apache License 2.0
102 stars 36 forks source link

About the trainning data of CasaNovo published in ICML 2022 #189

Closed wuwo007 closed 1 year ago

wuwo007 commented 1 year ago

Dear author, I noticed you used the nine-species benchmark data set and evaluation framework first introduced by DeepNovo and employed a leave-one-out cross validation framework in the paper of ICML 2022. However, I do not know how did you collect the eight species PSMs. For example, as the table S2 in deepnovo, the number of PSMs in mgfs excluding Human is 1397544, but in your shared files in f.MSV000081382/peak/DeepNovo/HighResolution/data/cross.9high_80k.exclude_human, the number of PSMs when excluding human is 555054. Could you please help me out with this confusion? Thanks.

bittremieux commented 1 year ago

Dataset MSV000081382 has not been uploaded by us, but by the DeepNovo authors, so I can't comment on the specific contents of all files. We used the same annotated MGF files as DeepNovo, which are available in the aforementioned MassIVE dataset, and performed the leave-one-species-out data splitting ourselves.