MannLabs / alphapeptdeep

Deep learning framework for proteomics
Apache License 2.0
105 stars 20 forks source link

problem of reading pFind.spectra #74

Closed ShadoWyyyy closed 1 year ago

ShadoWyyyy commented 1 year ago

Describe the bug Hello, I am a student of zhejiang university, after I have read the paper, I want to use it to facilitate better performance of DDA identification of HLA. but when I use its transfer function to fine tune the model with pFind psm, I met problem like this.

截屏2023-01-03 20 43 16

here is my pFind spectra file head

截屏2023-01-03 20 50 35

here is my "transfer" parameters in setting.yaml

截屏2023-01-03 20 44 55

I am sure I install the it in conda virtual environment successfully because I can use the library function and run it well.

any idea of how to deal with it would be very appreciated.

jalew188 commented 1 year ago

Would you please share me the log file (peptdeep_rescore.log) in the output_folder?

Did you use the Installer? or pip? or github code?

jalew188 commented 1 year ago

It may be also because that pythonnet is not installed. The log file also shows if pythonnet is installed or not. If not, see https://github.com/MannLabs/alphapeptdeep/tree/development#pip to install it.

jalew188 commented 1 year ago

@ShadoWyyyy Is problem solved?

ShadoWyyyy commented 1 year ago

Sorry, there's something wrong with my vpn, so I lost connect for a few days. Here is my rescore.log, the program seems interrupt almost immediately. peptdeep_rescore.log I follow the Developer installation instruction and create a conda virtual environment. I checked the installed package, I have pythonnet conda_list_install_package.txt Here is the full screenshot of cli when I execute the command line instruction.

截屏2023-01-09 15 53 38

I can't upload the settings.yaml because github seems doesn't support this file type.

oh I change the setting to txt and it can be uploaded now. settings0103.yaml.txt

ShadoWyyyy commented 1 year ago

And I can run the library function, here is the library.log peptdeep_library.log maybe the problem is related to the format of my pFind.spectra file? pFind.spectra.txt

jalew188 commented 1 year ago

This is a bug from pipeline API, I will create a new release version after fix this bug

jalew188 commented 1 year ago

Sorry, it seems that you are using arm64 (M1 chip) for MacOS. PeptDeep does not support it yet...

jalew188 commented 1 year ago

Try the latest version? https://github.com/MannLabs/alphapeptdeep/releases/tag/1.0.0. Don't know if it works though

ShadoWyyyy commented 1 year ago

Sorry, it seems that you are using arm64 (M1 chip) for MacOS. PeptDeep does not support it yet...

OK, maybe change to another operating system later and try again, thank you.

ShadoWyyyy commented 1 year ago

Hello, I am nowing using the Windows platform, however I still can't use it for HLA DDA rescoring. Again I use "pip install peptdeep", and everything is OK until I run peptdeep rescore function There is a part of error logging that turns out to be the same like I reported in this issue before, the screenshot is below image The full program log is also provided below peptdeep_rescore.log The settings.yaml is here 20230203_rescore_settings.txt The pfind.spectra is here 20221028_MHC_pFind.spectra.txt I really want to see the performance of this software on rescoring for identification results of traditional search engine, so if you need more information to find the problem, I can provide the data by email for you to test, thanks.

jalew188 commented 1 year ago

Could you please test pfind's output MGF file instead of RAW?

...
ms_file_type: mgf
    ms_file_type_choices:
    - alphapept_hdf
    - thermo_raw
    - mgf
    - mzml
    ms_files: 
    - E:\peptdeep\raw\20221028_MHC_HCDFT.mgf
    - E:\peptdeep\raw\20221111_HCC_387_HCDFT.mgf
    - E:\peptdeep\raw\20221111_HCC_387_HCDFT.mgf
ShadoWyyyy commented 1 year ago

I change raw to mgf file and here is the rescore.log peptdeep_rescore.log seems different from before, and the process bar is like below image

jalew188 commented 1 year ago

I will check my own data then

ShadoWyyyy commented 1 year ago

I will check my own data then

OK,looking forward to your reply and solutions

jalew188 commented 1 year ago

I kind of know what is the issue. Would you please rename xxx_HCDFT.mgf as xxx.mgf, it should work this time.

jalew188 commented 1 year ago

I will support xxx_HCDFT.mgf in the next release.

ShadoWyyyy commented 1 year ago

It works, the mgf files are read successfully and transfer learning, but the process end in iteration: 2023-02-06 09:04:53> Require fine-tuning models ... 2023-02-06 09:04:53> Preparing for fine-tuning ... 2023-02-06 09:05:04> Fine-tuning ... 2023-02-06 09:05:04> 1898 PSMs for RT model training/transfer learning 2023-02-06 09:06:29> 3198 PSMs for MS2 model training/transfer learning 2023-02-06 09:08:15> Extracting peptdeep features for 19711 PSMs with multiprocessing ... 2023-02-06 09:09:02> Finished feature extraction with multiprocessing 2023-02-06 09:09:02> [PERC] 3198 target PSMs at 0.01 psm-level FDR 2023-02-06 09:09:02> [PERC] Iteration 1 of Percolator ... 2023-02-06 09:09:02> Traceback (most recent call last): File "C:\Users\DELL.conda\envs\peptdeep\lib\site-packages\peptdeep\pipeline_api.py", line 379, in rescore psm_df = percolator.re_score(psm_df) File "C:\Users\DELL.conda\envs\peptdeep\lib\site-packages\peptdeep\rescore\percolator.py", line 460, in re_score df = self._cv_score(df) File "C:\Users\DELL.conda\envs\peptdeep\lib\site-packages\peptdeep\rescore\percolator.py", line 375, in _cv_score self._train(train_t_df, df_decoy) File "C:\Users\DELL.conda\envs\peptdeep\lib\site-packages\peptdeep\rescore\percolator.py", line 314, in _train train_df[self.feature_list].values, File "C:\Users\DELL.conda\envs\peptdeep\lib\site-packages\pandas\core\frame.py", line 3813, in getitem indexer = self.columns._get_indexer_strict(key, "columns")[1] File "C:\Users\DELL.conda\envs\peptdeep\lib\site-packages\pandas\core\indexes\base.py", line 6070, in _get_indexer_strict self._raise_if_missing(keyarr, indexer, axis_name) File "C:\Users\DELL.conda\envs\peptdeep\lib\site-packages\pandas\core\indexes\base.py", line 6133, in _raise_if_missing raise KeyError(f"{not_found} not in index") KeyError: "['raw_score'] not in index"

I don't understand, because I didn't change anything here, why the raw_score not in index image Or should I configure something here in the settings.yaml ?

jalew188 commented 1 year ago

Just set it as:

pfind: []

raw_score will be ignored

ShadoWyyyy commented 1 year ago

Just set it as:

pfind: []

raw_score will be ignored

Yes, I recognized it and tried again, but the same error still exists. The program seems still complain about missing raw_score even after I delete it in settings.yaml file peptdeep_rescore.log

jalew188 commented 1 year ago

I see, loading other_score_columns happens before loading the yaml file. please go to alphapept_folder/peptdeep/constants/default_settings.yaml, and modify the pfind term as pfind: {}.

I will also provide a jupyter notebook. I just tested the notebook and it worked for the RA957 dataset. HLA_RA957_DDA_run_AD.zip

------ edited ------- For the notebook, you don't need to modify yaml files

ShadoWyyyy commented 1 year ago

I see, loading other_score_columns happens before loading the yaml file. please go to alphapept_folder/peptdeep/constants/default_settings.yaml, and modify the pfind term as pfind: {}.

I will also provide a jupyter notebook. I just tested the notebook and it worked for the RA957 dataset. HLA_RA957_DDA_run_AD.zip

------ edited ------- For the notebook, you don't need to modify yaml files

ok, Thanks, I will follow the instruction.