MannLabs / alphapeptdeep

Deep learning framework for proteomics
Apache License 2.0
102 stars 20 forks source link

Use refined model that contains user-defined mod to generate a predicted library #129

Closed cctsou closed 6 months ago

cctsou commented 6 months ago
          A follow-up question: I was able to get the refined model and I am trying to use it to predict a new library given a FASTA file, but I encountered the following error message, it looks like it has something to do with the user-defined mod. Any clue?
[PeptDeep] Starting a new job 'C:\Users\Chih-ChiangTsou/peptdeep/tasks/queue\peptdeep_library_2024-01-06--12-47-56.749038.yaml'...
[PeptDeep] Predicting library ...
2024-01-06 12:47:59> [PeptDeep] Running library task ...
2024-01-06 12:47:59> Input files (fasta): ['D:/fasta/test.fasta']
2024-01-06 12:47:59> Platform information:
2024-01-06 12:47:59> system        - Windows
2024-01-06 12:47:59> release       - 10
2024-01-06 12:47:59> version       - 10.0.22631
2024-01-06 12:47:59> machine       - AMD64
2024-01-06 12:47:59> processor     - Intel64 Family 6 Model 141 Stepping 1, GenuineIntel
2024-01-06 12:47:59> cpu count     - 16
2024-01-06 12:47:59> ram           - 56.2/79.7 Gb (available/total)
2024-01-06 12:47:59>
2024-01-06 12:47:59> Python information:
2024-01-06 12:47:59> alphabase        - 1.2.0
2024-01-06 12:47:59> alpharaw         - 0.4.0
2024-01-06 12:47:59> biopython        -
2024-01-06 12:47:59> click            - 8.1.7
2024-01-06 12:47:59> lxml             - 4.9.4
2024-01-06 12:47:59> numba            - 0.58.1
2024-01-06 12:47:59> numpy            - 1.26.2
2024-01-06 12:47:59> pandas           - 2.1.4
2024-01-06 12:47:59> peptdeep         - 1.1.3
2024-01-06 12:47:59> psutil           - 5.9.7
2024-01-06 12:47:59> pyteomics        - 4.6.3
2024-01-06 12:47:59> python           - 3.9.18
2024-01-06 12:47:59> scikit-learn     - 1.3.2
2024-01-06 12:47:59> streamlit        - 1.29.0
2024-01-06 12:47:59> streamlit-aggrid -
2024-01-06 12:47:59> torch            - 2.1.2
2024-01-06 12:47:59> tqdm             - 4.66.1
2024-01-06 12:47:59> transformers     - 4.36.2
2024-01-06 12:47:59>
2024-01-06 12:48:01> Using external ms2 model: 'C:/Users/Chih-ChiangTsou/peptdeep/refined_models/ms2.pth'
2024-01-06 12:48:01> Using external rt model: 'C:/Users/Chih-ChiangTsou/peptdeep/refined_models/rt.pth'
2024-01-06 12:48:01> Using external ccs model: 'C:/Users/Chih-ChiangTsou/peptdeep/refined_models/ccs.pth'
2024-01-06 12:48:01> xxx/library.tsv does not exist, use default IRT_PEPTIDE_DF to translate irt
2024-01-06 12:48:01> Generating the spectral library ...
2024-01-06 12:48:01> Loaded 17865 precursors.
2024-01-06 12:48:01> Predicting RT/IM/MS2 for 16892 precursors ...
2024-01-06 12:48:01> Using multiprocessing with 16 processes ...
2024-01-06 12:48:01> Predicting rt,mobility,ms2 ...
  0%|                                                                                           | 0/31 [00:15<?, ?it/s]
2024-01-06 12:48:18> multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "multiprocessing\pool.py", line 125, in worker
  File "peptdeep\pretrained_models.py", line 914, in _predict_func_for_mp
    return self.predict_all(
  File "peptdeep\pretrained_models.py", line 1084, in predict_all
    self.predict_rt(precursor_df,
  File "peptdeep\pretrained_models.py", line 877, in predict_rt
    df = self.rt_model.predict(precursor_df,
  File "peptdeep\model\model_interface.py", line 388, in predict
    features = self._get_features_from_batch_df(
  File "peptdeep\model\rt.py", line 161, in _get_features_from_batch_df
    self._get_mod_features(batch_df)
  File "peptdeep\model\model_interface.py", line 812, in _get_mod_features
    get_batch_mod_feature(batch_df)
  File "peptdeep\model\featurize.py", line 86, in get_batch_mod_feature
    mod_features_list = batch_df.mods.str.split(';').apply(
  File "pandas\core\series.py", line 4757, in apply
    return SeriesApply(
  File "pandas\core\apply.py", line 1209, in apply
    return self.apply_standard()
  File "pandas\core\apply.py", line 1289, in apply_standard
    mapped = obj._map_values(
  File "pandas\core\base.py", line 921, in _map_values
    return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert)
  File "pandas\core\algorithms.py", line 1814, in map_array
    return lib.map_infer(values, mapper, convert=convert)
  File "lib.pyx", line 2926, in pandas._libs.lib.map_infer
  File "peptdeep\model\featurize.py", line 87, in <lambda>
    lambda mod_names: [
  File "peptdeep\model\featurize.py", line 88, in <listcomp>
    MOD_TO_FEATURE[mod] for mod in mod_names
KeyError: 'IADTB@C'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "peptdeep\pipeline_api.py", line 416, in generate_library
    lib_maker.make_library(lib_settings['infiles'])
  File "peptdeep\spec_lib\library_factory.py", line 105, in make_library
    self._predict()
  File "peptdeep\spec_lib\library_factory.py", line 68, in _predict
    self.spec_lib.predict_all()
  File "peptdeep\spec_lib\predict_lib.py", line 121, in predict_all
    res = self.model_manager.predict_all(
  File "peptdeep\pretrained_models.py", line 1127, in predict_all
    return self.predict_all_mp(
  File "peptdeep\pretrained_models.py", line 964, in predict_all_mp
    for ret_dict in process_bar(
  File "peptdeep\utils.py", line 27, in process_bar
    for i,iter in enumerate(iterator):
  File "multiprocessing\pool.py", line 870, in next
KeyError: 'IADTB@C'

'IADTB@C'`

Originally posted by @cctsou in https://github.com/MannLabs/alphapeptdeep/issues/125#issuecomment-1879782515

jalew188 commented 6 months ago

@cctsou You have to check two things here:

  1. You need to add IADTB@C into user-defined mod again when predicting the library;
  2. In the yaml settings, set translate_mod_to_unimod_id as false, or in the GUI -> library, uncheck Translate to Unimod ids. As IADTB@C does not have an ID yet.

I just successfully generate an IADTB@C library. translate_mod_to_unimod_id == true, I got a KeyError:

File "/Users/wenfengzeng/workspace/alphabase/alphabase/spectral_library/translate.py", line 50, in <listcomp>
    mods = [translate_mod_dict[mod] for mod in mods]
KeyError: 'IADTB@C'
Traceback (most recent call last):

translate_mod_to_unimod_id == false, I got the tsv library:

_NQLTKC[IADTB]R_    2   0.3223378658294678  1.0305768296490587  NQLTKCR 579.82423233085 xxx xx  1   y   803.4556    1.0 1   4   noloss
_NQLTKC[IADTB]R_    2   0.3223378658294678  1.0305768296490587  NQLTKCR 579.82423233085 xxx xx  1   b   243.10878   0.81772846  1   2   noloss
_NQLTKC[IADTB]R_    2   0.3223378658294678  1.0305768296490587  NQLTKCR 579.82423233085 xxx xx  1   y   916.5397    0.76442915  1   5   noloss
_NQLTKC[IADTB]R_    2   0.3223378658294678  1.0305768296490587  NQLTKCR 579.82423233085 xxx xx  1   y   702.40796   0.4161628   1   3   noloss
_NQLTKC[IADTB]R_    2   0.3223378658294678  1.0305768296490587  NQLTKCR 579.82423233085 xxx xx  1   y   458.77347   0.39316234  2   5   noloss
_NQLTKC[IADTB]R_    2   0.3223378658294678  1.0305768296490587  NQLTKCR 579.82423233085 xxx xx  1   y   574.313 0.34266013  1   2   noloss
_NQLTKC[IADTB]R_    2   0.3223378658294678  1.0305768296490587  NQLTKCR 579.82423233085 xxx xx  1   b   356.19284   0.23460487  1   3   noloss

Let me know if it works. Or share with me the yaml settings file again, thus I can debug on my side.

cctsou commented 6 months ago

Hi @jalew188 , thanks, I did have the user-defined mod set and also translate_mod_to_unimod_id as false. The key error message looks a little different than yours.

Here are my yaml file and the FASTA I used. Thanks again!!

test.fasta.txt peptdeep_library_2024-01-06--12-37-57.155932.yaml.txt

jalew188 commented 6 months ago

@cctsou I got it. This is the issue of multiprocessing. User-defined-mods are not updated into the sub processes. The debugging is not that quick.

You can use a GPU machine to disable multiprocessing-based prediction.

jalew188 commented 6 months ago

Multiprocessing issues will be fixed in the next patch: v1.1.4

cctsou commented 6 months ago

Multiprocessing issues will be fixed in the next patch: v1.1.4

Thanks again for the deligent works during the weekend.

cctsou commented 6 months ago

Multiprocessing issues will be fixed in the next patch: v1.1.4

Hi @jalew188 , I am half way generating a big spectral library from a whole human proteome FASTA, there was an error thrown during writing the spectral library in TSV format. Here is the process message I have so far. I see a predict.speclib.hdf with 7GB size, but the tsv file is empty. Could you take a look, please advise, thank you again

2024-01-07 10:52:25> [PeptDeep] Running library task ...
2024-01-07 10:52:25> Input files (fasta): ['D:/fasta/uniprot_swissprot_20200903_can_iso_with_LOH_altAlleles_Ver3_20230808_fragpipe.fasta']
2024-01-07 10:52:25> Platform information:
2024-01-07 10:52:25> system        - Windows
2024-01-07 10:52:25> release       - 10
2024-01-07 10:52:25> version       - 10.0.22631
2024-01-07 10:52:25> machine       - AMD64
2024-01-07 10:52:25> processor     - Intel64 Family 6 Model 141 Stepping 1, GenuineIntel
2024-01-07 10:52:25> cpu count     - 16
2024-01-07 10:52:25> ram           - 57.3/79.7 Gb (available/total)
2024-01-07 10:52:25>
2024-01-07 10:52:25> Python information:
2024-01-07 10:52:25> alphabase        - 1.2.0
2024-01-07 10:52:25> alpharaw         - 0.4.0
2024-01-07 10:52:25> biopython        - 1.82
2024-01-07 10:52:25> click            - 8.1.7
2024-01-07 10:52:25> lxml             - 5.0.0
2024-01-07 10:52:25> numba            - 0.58.1
2024-01-07 10:52:25> numpy            - 1.22.3
2024-01-07 10:52:25> pandas           - 1.4.2
2024-01-07 10:52:25> peptdeep         - 1.1.4
2024-01-07 10:52:25> psutil           - 5.9.7
2024-01-07 10:52:25> pyteomics        - 4.6.3
2024-01-07 10:52:25> python           - 3.10.4
2024-01-07 10:52:25> scikit-learn     - 1.3.2
2024-01-07 10:52:25> streamlit        - 1.29.0
2024-01-07 10:52:25> streamlit-aggrid - 0.3.4.post3
2024-01-07 10:52:25> torch            - 2.1.2
2024-01-07 10:52:25> tqdm             - 4.66.1
2024-01-07 10:52:25> transformers     - 4.36.2
2024-01-07 10:52:25>
2024-01-07 10:52:27> Using external ms2 model: 'C:/Users/Chih-ChiangTsou/peptdeep/refined_models/ms2.pth'
2024-01-07 10:52:27> Using external rt model: 'C:/Users/Chih-ChiangTsou/peptdeep/refined_models/rt.pth'
2024-01-07 10:52:27> Using external ccs model: 'C:/Users/Chih-ChiangTsou/peptdeep/refined_models/ccs.pth'
2024-01-07 10:52:27> xxx/library.tsv does not exist, use default IRT_PEPTIDE_DF to translate irt
2024-01-07 10:52:27> Generating the spectral library ...
2024-01-07 10:55:21> Loaded 23934810 precursors.
2024-01-07 10:56:23> Predicting RT/IM/MS2 for 23031874 precursors ...
2024-01-07 10:56:23> Using multiprocessing with 16 processes ...
2024-01-07 10:56:23> Predicting rt,mobility,ms2 ...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 245/245 [4:58:30<00:00, 73.10s/it]
2024-01-07 15:55:09> End predicting RT/IM/MS2
2024-01-07 15:55:09> Predicting the spectral library with 23031874 precursors and 1549.15M fragments used 16.7805 GB memory
2024-01-07 15:55:09> Saving HDF library to C:/Users/Chih-ChiangTsou/peptdeep/spec_libs\predict.speclib.hdf ...
2024-01-07 15:57:32> Translating to C:/Users/Chih-ChiangTsou/peptdeep/spec_libs\predict.speclib.tsv for DiaNN/Spectronaut...
  0%|▋                                                                                                                                                                        | 1/231 [00:19<1:15:09, 19.61s/it]Process WritingProcess-26:
Traceback (most recent call last):
  File "C:\Users\Chih-ChiangTsou\AppData\Local\Programs\Python\Python310\lib\multiprocessing\process.py", line 315, in _bootstrap
    self.run()
  File "C:\Users\Chih-ChiangTsou\AppData\Local\Programs\Python\Python310\lib\site-packages\alphabase\spectral_library\translate.py", line 391, in run
    df.to_csv(self.tsv, header=(batch==0), sep="\t", mode="a", index=False, lineterminator="\n")
TypeError: NDFrame.to_csv() got an unexpected keyword argument 'lineterminator'
  5%|████████                                                                                                                                                                | 11/231 [03:22<1:05:36, 17.89s/it]
jalew188 commented 6 months ago

Upgrading pandas to v1.5.x will fix this exception. lineterminator is line_terminator before v1.4.x ...

jalew188 commented 6 months ago

Dear @cctsou Thanks a lot for reporting these issues to me, I really appreciate it:)

cctsou commented 6 months ago

Dear @cctsou Thanks a lot for reporting these issues to me, I really appreciate it:)

Thank you for your patience and for helping us to resolve it, very much appreciated!! I was able to get a library and I am now running DIA-NN to see if it improves compared to DIA-NN's original predicted library.