Closed WeiqiangChen closed 2 years ago
Dear WeiqiangChen,
That is definitely possible! However, from the example you posted the column names are wrong. In this case the column "modification" should be ""modifications". Otherwise DeepLC is looking for a column that does not exist.
If you are interested in retraining a model definitely also keep an eye on this repo: https://github.com/RobbinBouwmeester/DeepLCRetrainer
Soon I will launch that code (with GUI) that should enable retraining/transfer learning in an easier way.
Hope that helped!
Kind regards,
Robbin
Dear Robbin,
thanks for the reply. Now deepLC works. I tried to train it with unmodified peptides from MaxQuant evidence.txt. Group_by modified_sequence, slice_max(order_by intensity, n = 1) to get the apex retention time for each modified_sequence. And the average(predicted_tr - tr) is 7.9mins. seq,modifications,tr AAAESIQMR,8|Oxidation,1353 AASVGPTMR,8|Oxidation,1264.26 ADLEMQIESLK,5|Oxidation,5267.34
I have tried to test also acetylated peptides. And the average(predicted_tr - tr) is now -20mins. seq,modifications,tr AARPLVTVYDEK,1|Acetyl,4367.64 ADFDTNPTSLYSIK,1|Acetyl,7029 AHIVQTHK,1|Acetyl,1314.48 AQHPLVQR,1|Acetyl,1989.42
Did I make some mistakes here?
It could be that the current models you use are not able to extrapolate to your modifications. Could you try these models:
https://github.com/RobbinBouwmeester/DeepLCModels/blob/main/full_hc_mod_deeplc_train_filtered_1fd8363d9af9dcad3be7553c39396960.hdf5 https://github.com/RobbinBouwmeester/DeepLCModels/blob/main/full_hc_mod_deeplc_train_filtered_8c22d89667368f2f02ad996469ba157e.hdf5 https://github.com/RobbinBouwmeester/DeepLCModels/blob/main/full_hc_mod_deeplc_train_filtered_cb975cfdd4105f97efa0b3afffe075cc.hdf5
Model2_[469ba157e.hdf5] got the best prediction for oxidated peptides when training with unmodified peptides.
All 3 models got bad predictions for acetylated peptides when training with unmodified peptides.
I see... Now, are these n-terminal acetylated peptides? Officially we do not support terminal modifications. Although you can include them on the 0 or 1 position (it will default to the rest group of the first AA) it is likely to be suboptimal...
Yes. These are acetylation(protein N term).
Ok, that is likely to be the problem. I would recommend in that case to retrain a model with many acetylated termini, so it will "force fit" it into the current DeepLC. Feel free to contact me via e-mail (robbin.bouwmeester[at]ugent.be about the details). Can I close this issue now?
I am using the windows installed deepLC application. Train with unmodified peptide information from MaxQuant evidence.txt, and test with unmodified peptide gets good results. However, there is the following error when using unmodified peptides to train and test with modified peptides.
Traceback (most recent call last): File "pandas\core\indexes\base.py", line 3621, in get_loc return self._engine.get_loc(casted_key) File "pandas_libs\index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc cpdef get_loc(self, object val): File "pandas_libs\index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc return self.mapping.get_item(val) File "pandas_libs\hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas_libs\hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'modifications' The above exception was the direct cause of the following exception: Traceback (most recent call last): File "deeplc\gui.py", line 38, in
start_gui()
File "gooey\python_bindings\gooey_decorator.py", line 134, in
return lambda *args, *kwargs: func(args, kwargs)
File "deeplc\gui.py", line 35, in start_gui
main(gui=True)
File "deeplc__main__.py", line 65, in main
run(vars(argu))
File "deeplc__main.py", line 155, in run
preds = dlc.make_preds(seq_df=df_pred)
File "deeplc\deeplc.py", line 862, in make_preds
temp_preds = self.make_preds_core(
File "deeplc\deeplc.py", line 455, in make_preds_core
seq_df["idents"] = seq_df["seq"] + "|" + seq_df["modifications"]
File "pandas\core\frame.py", line 3505, in getitem__
indexer = self.columns.get_loc(key)
File "pandas\core\indexes\base.py", line 3623, in get_loc
raise KeyError(key) from err
KeyError: 'modifications'*
my train csv: seq,modifications,tr ISDAGEVVAIAR,,4013.04 ATMQNLNDR,,1882.4399999999998 TTTTTTTVVTQK,,1673.8799999999999 ....... The modified peptide csv seq,modification,tr AARPLVTVYDEK,1|Acetyl,4367.64 ADFDTNPTSLYSIK,1|Acetyl,7029 AHIVQTHK,1|Acetyl,1314.48
Another modified peptide csv also got the same error. seq,modification,tr AAAESIQMR,8|Oxidation,1353 AASVGPTMR,8|Oxidation,1264.26 ADLEMQIESLK,5|Oxidation,5267.34
Is it possible to train deepLC using unmodified peptides and test with modified peptides?