MannLabs / alphapeptdeep

Deep learning framework for proteomics
Apache License 2.0
102 stars 20 forks source link

Issues when running transfer_learn() #97

Closed mhamaneh closed 11 months ago

mhamaneh commented 1 year ago

I am trying to fine-tune peptdeep using the function transfer_learn(). I am using the example given here. Of course I have changed the paths to sm and psm files. I have also changed mgr_settings['transfer']['psm_type'] to 'maxquant' and mgr_settings['transfer']['ms_file_type'] to 'thermo'. When I run transfer_learn() I face the following two issues. Could you please help me with these? 1) If the name of the raw file, given in the 'maxquant' file 'evidence.txt', contains uppercase letters, the program does not read the raw file at all and crashes with the following error message:

Traceback (most recent call last): File "/gpfs/gsfs9/users/qmbp_ms/Mehdi/alphapeptdeep/peptdeep/pipeline_api.py", line 205, in transfer_learn psm_df, frag_df = match_psms() File "/gpfs/gsfs9/users/qmbp_ms/Mehdi/alphapeptdeep/peptdeep/pipeline_api.py", line 133, in match_psms return concat_precursor_fragment_dataframes( File "/gpfs/gsfs9/users/qmbp_ms/Mehdi/conda/envs/alpha/lib/python3.8/site-packages/alphabase/peptide/fragment.py", line 367, in concat_precursor_fragment_dataframes pd.concat(precursor_df_list, ignore_index=True), File "/gpfs/gsfs9/users/qmbp_ms/Mehdi/conda/envs/alpha/lib/python3.8/site-packages/pandas/util/_decorators.py", line 331, in wrapper return func(*args, **kwargs) File "/gpfs/gsfs9/users/qmbp_ms/Mehdi/conda/envs/alpha/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 368, in concat op = _Concatenator( File "/gpfs/gsfs9/users/qmbp_ms/Mehdi/conda/envs/alpha/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 425, in init raise ValueError("No objects to concatenate") ValueError: No objects to concatenate

2) I was able to work around the previous issue by changing the names of the raw files in the maxquant file to lowercase. But then I faced another error message. It seems that GetMSOrderForScanNum() requires a standard integer as an input, but it is getting numpy int64. Here is the error message:

Python.Runtime.PythonException: an integer is required

The above exception was the direct cause of the following exception:

System.ArgumentException: an integer is required in method ThermoFisher.CommonCore.Data.Interfaces.IScanEvent GetScanEventForScanNumber(Int32) ---> Python.Runtime.PythonException: an integer is required --- End of inner exception stack trace ---

The above exception was the direct cause of the following exception:

System.AggregateException: One or more errors occurred. (an integer is required in method ThermoFisher.CommonCore.Data.Interfaces.IScanEvent GetScanEventForScanNumber(Int32)) ---> System.ArgumentException: an integer is required in method ThermoFisher.CommonCore.Data.Interfaces.IScanEvent GetScanEventForScanNumber(Int32) ---> Python.Runtime.PythonException: an integer is required --- End of inner exception stack trace --- --- End of inner exception stack trace --- ---> (Inner Exception #0) System.ArgumentException: an integer is required in method ThermoFisher.CommonCore.Data.Interfaces.IScanEvent GetScanEventForScanNumber(Int32) ---> Python.Runtime.PythonException: an integer is required --- End of inner exception stack trace ---<---

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/gpfs/gsfs9/users/qmbp_ms/Mehdi/alphapeptdeep/peptdeep/pipeline_api.py", line 205, in transfer_learn psm_df, frag_df = match_psms() File "/gpfs/gsfs9/users/qmbp_ms/Mehdi/alphapeptdeep/peptdeep/pipeline_api.py", line 122, in match_psms ) = match_one_raw( File "/gpfs/gsfs9/users/qmbp_ms/Mehdi/alphapeptdeep/peptdeep/rescore/feature_extractor.py", line 41, in match_one_raw ) = match.match_ms2_one_raw( File "/gpfs/gsfs9/users/qmbp_ms/Mehdi/alphapeptdeep/peptdeep/mass_spec/match.py", line 282, in match_ms2_one_raw ms2_reader.load(ms2_file) File "/gpfs/gsfs9/users/qmbp_ms/Mehdi/alphapeptdeep/peptdeep/mass_spec/ms_reader.py", line 335, in load ms_order = rawfile.GetMSOrderForScanNum(i) File "/gpfs/gsfs9/users/qmbp_ms/Mehdi/alphapeptdeep/peptdeep/legacy/thermo_raw/pyrawfilereader.py", line 487, in GetMSOrderForScanNum return IScanEventBase(self.source.GetScanEventForScanNumber(scanNumber)).MSOrder TypeError: No method matches given arguments for IRawDataPlus.GetScanEventForScanNumber: (<class 'numpy.int64'>)

Best regards, Mehdi

jalew188 commented 1 year ago

This is indeed a bug. We are developing a better RAW reader that may fix this issue. If you search the DDA data with MaxQuant and generate a spectral library by some other tools, you can then do transfer learning on spec lib file without requiring RAW files.

mhamaneh commented 1 year ago

Thank you very much for the prompt response. Could you please explain a lit bit more? What format should the file (spectral library) have and also how does transfer_learn() accesses the file?

jalew188 commented 11 months ago

General TSV spectral library generated by for example Spectronaut or MSFragger is supported