compomics / ms2pip

MS²PIP: Fast and accurate peptide spectrum prediction for multiple fragmentation methods, instruments, and labeling techniques.
https://ms2pip.readthedocs.io
Apache License 2.0
37 stars 18 forks source link

fasta2speclib spectronaut format does not include a ProteinId column #35

Closed fburic closed 4 years ago

fburic commented 4 years ago

The spec_id column is not used in the final spectronaut_df dataframe that gets written.

This causes issues with some tools such as OpenSwathWorkflow that seem to require a protein ID column. Generally, it would be nice to that have information anyway.

My quick fix (to my copy of the code is): line 469 of ms2pipc/ms2pip/ms2pip_tools/spectrum_output.py spectronaut_df = spectronaut_df[['spec_id'] + peptide_cols + fragment_cols]

Though I would consider extracting the ID part and saving it as a column with a standard name, i.e. something like: spectronaut_df = spectronaut_df.assign(ProteinId = spectronaut_df['spec_id'].apply(lambda s: s.split('_')[0]))

RalfG commented 4 years ago

Hi @fburic, With version 3.5.0 and onwards the ProteinId column is written to the Spectronaut CSV files. Can you verify that this works on your side? Thanks!

fburic commented 4 years ago

Hi @RalfG, Great! Yes, it works for me (ver. 3.5.1). Thank you!