Closed cctsou closed 6 months ago
Hi @cctsou , in the yaml settings:
psm_modification_mapping:
IADTB@C:
- (IADTB)
Oxidation@M:
- (UniMod:35)
Acetyl@Protein_N-term:
- (UniMod:1)
must be changed to
psm_modification_mapping:
IADTB@C:
- C(IADTB)
Oxidation@M:
- M(UniMod:35)
Acetyl@Protein_N-term:
- _(UniMod:1)
Let me know if this addresses your issue:)
Hi @jalew188 , thanks, I did try that first but still, those entries were considered unknown modifications. Any suggestions? Since our IADTB mod is the same mass as UniMod:2062, I also tried replacing all strings C(IADTB) to C(UniMod:2062) in the spec tsv and also changed the yaml accordingly. but still no luck.
@cctsou This is indeed a bug, I have fixed it in v1.1.3.
It's working, thanks a lot!!
A follow-up question: I was able to get the refined model and I am trying to use it to predict a new library given a FASTA file, but I encountered the following error message, it looks like it has something to do with the user-defined mod. Any clue?
[PeptDeep] Starting a new job 'C:\Users\Chih-ChiangTsou/peptdeep/tasks/queue\peptdeep_library_2024-01-06--12-47-56.749038.yaml'...
[PeptDeep] Predicting library ...
2024-01-06 12:47:59> [PeptDeep] Running library task ...
2024-01-06 12:47:59> Input files (fasta): ['D:/fasta/test.fasta']
2024-01-06 12:47:59> Platform information:
2024-01-06 12:47:59> system - Windows
2024-01-06 12:47:59> release - 10
2024-01-06 12:47:59> version - 10.0.22631
2024-01-06 12:47:59> machine - AMD64
2024-01-06 12:47:59> processor - Intel64 Family 6 Model 141 Stepping 1, GenuineIntel
2024-01-06 12:47:59> cpu count - 16
2024-01-06 12:47:59> ram - 56.2/79.7 Gb (available/total)
2024-01-06 12:47:59>
2024-01-06 12:47:59> Python information:
2024-01-06 12:47:59> alphabase - 1.2.0
2024-01-06 12:47:59> alpharaw - 0.4.0
2024-01-06 12:47:59> biopython -
2024-01-06 12:47:59> click - 8.1.7
2024-01-06 12:47:59> lxml - 4.9.4
2024-01-06 12:47:59> numba - 0.58.1
2024-01-06 12:47:59> numpy - 1.26.2
2024-01-06 12:47:59> pandas - 2.1.4
2024-01-06 12:47:59> peptdeep - 1.1.3
2024-01-06 12:47:59> psutil - 5.9.7
2024-01-06 12:47:59> pyteomics - 4.6.3
2024-01-06 12:47:59> python - 3.9.18
2024-01-06 12:47:59> scikit-learn - 1.3.2
2024-01-06 12:47:59> streamlit - 1.29.0
2024-01-06 12:47:59> streamlit-aggrid -
2024-01-06 12:47:59> torch - 2.1.2
2024-01-06 12:47:59> tqdm - 4.66.1
2024-01-06 12:47:59> transformers - 4.36.2
2024-01-06 12:47:59>
2024-01-06 12:48:01> Using external ms2 model: 'C:/Users/Chih-ChiangTsou/peptdeep/refined_models/ms2.pth'
2024-01-06 12:48:01> Using external rt model: 'C:/Users/Chih-ChiangTsou/peptdeep/refined_models/rt.pth'
2024-01-06 12:48:01> Using external ccs model: 'C:/Users/Chih-ChiangTsou/peptdeep/refined_models/ccs.pth'
2024-01-06 12:48:01> xxx/library.tsv does not exist, use default IRT_PEPTIDE_DF to translate irt
2024-01-06 12:48:01> Generating the spectral library ...
2024-01-06 12:48:01> Loaded 17865 precursors.
2024-01-06 12:48:01> Predicting RT/IM/MS2 for 16892 precursors ...
2024-01-06 12:48:01> Using multiprocessing with 16 processes ...
2024-01-06 12:48:01> Predicting rt,mobility,ms2 ...
0%| | 0/31 [00:15<?, ?it/s]
2024-01-06 12:48:18> multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "multiprocessing\pool.py", line 125, in worker
File "peptdeep\pretrained_models.py", line 914, in _predict_func_for_mp
return self.predict_all(
File "peptdeep\pretrained_models.py", line 1084, in predict_all
self.predict_rt(precursor_df,
File "peptdeep\pretrained_models.py", line 877, in predict_rt
df = self.rt_model.predict(precursor_df,
File "peptdeep\model\model_interface.py", line 388, in predict
features = self._get_features_from_batch_df(
File "peptdeep\model\rt.py", line 161, in _get_features_from_batch_df
self._get_mod_features(batch_df)
File "peptdeep\model\model_interface.py", line 812, in _get_mod_features
get_batch_mod_feature(batch_df)
File "peptdeep\model\featurize.py", line 86, in get_batch_mod_feature
mod_features_list = batch_df.mods.str.split(';').apply(
File "pandas\core\series.py", line 4757, in apply
return SeriesApply(
File "pandas\core\apply.py", line 1209, in apply
return self.apply_standard()
File "pandas\core\apply.py", line 1289, in apply_standard
mapped = obj._map_values(
File "pandas\core\base.py", line 921, in _map_values
return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert)
File "pandas\core\algorithms.py", line 1814, in map_array
return lib.map_infer(values, mapper, convert=convert)
File "lib.pyx", line 2926, in pandas._libs.lib.map_infer
File "peptdeep\model\featurize.py", line 87, in <lambda>
lambda mod_names: [
File "peptdeep\model\featurize.py", line 88, in <listcomp>
MOD_TO_FEATURE[mod] for mod in mod_names
KeyError: 'IADTB@C'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "peptdeep\pipeline_api.py", line 416, in generate_library
lib_maker.make_library(lib_settings['infiles'])
File "peptdeep\spec_lib\library_factory.py", line 105, in make_library
self._predict()
File "peptdeep\spec_lib\library_factory.py", line 68, in _predict
self.spec_lib.predict_all()
File "peptdeep\spec_lib\predict_lib.py", line 121, in predict_all
res = self.model_manager.predict_all(
File "peptdeep\pretrained_models.py", line 1127, in predict_all
return self.predict_all_mp(
File "peptdeep\pretrained_models.py", line 964, in predict_all_mp
for ret_dict in process_bar(
File "peptdeep\utils.py", line 27, in process_bar
for i,iter in enumerate(iterator):
File "multiprocessing\pool.py", line 870, in next
KeyError: 'IADTB@C'
'IADTB@C'`
Hi
I am trying to retrain the MS2 model using a spectral library file (tsv format) which contains a custom modification on cysteine.
Here is the output I got, it looks like most of spectra were skipped because the unknown modification. Could you help me to figure out which part I did wrong? Here I attached the ymal file and the trimmed version of the library tsv file. Thank you very much in advance.
peptdeep_transfer_2024-01-05--11-10-14.400893.yaml.txt NCIH_KRAS_GPF_Library_1_plus_2.report-lib_trimmed.tsv.txt