Open gureann opened 2 years ago
Hi, I am seeing similar issues. I do not have any decoy and target peptide that have common sequence, even then I see this happening. Do you know how to fix it?
Hi @shubham1637 , looking back at this issue again, I think the main problem was caused by different genes were assigned to same one peptide sequence, and the decoys in my first narrative was only one way to reach it.
Which means if peptideA has GeneI in some rows and GeneII in other rows, this would lead to duplicated transition IDs, since the join action for tables in PQP file will also use gene column. Different genes would be kept, and any other values that were same would be repeated.
If you get same error in second code block when running OSW, and error in third code block when runnning TargetedFileConverter, I think you can have a look at the genes (or protein groups?) in your tsv or pqp file.
Hope this would be helpful.
Best, Ronghui
You r right. I removed gene column altogether and it doesn't throw error anymore. Thanks!
Best, Shubham
On Mon., Feb. 7, 2022, 11:40 p.m. Ronghui Lou, @.***> wrote:
Hi @shubham1637 https://github.com/shubham1637 , looking back at this issue again, I think the main problem was caused by different genes were assigned to same one peptide sequence, and the decoys in my first narrative was only one way to reach it.
Which means if peptideA has GeneI in some rows and GeneII in other rows, this would lead to duplicated transition IDs, since the join action for tables in PQP file will also use gene column. Different genes would be kept, and any other values that were same would be repeated.
If you get same error in second code block when running OSW, and error in third code block when runnning TargetedFileConverter, I think you can have a look at the genes (or protein groups?) in your tsv or pqp file.
Hope this would be helpful.
Best, Ronghui
— Reply to this email directly, view it on GitHub https://github.com/OpenMS/OpenMS/issues/5653#issuecomment-1032208366, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACUNCXCHTK6V6265NXYXZUTU2CNDBANCNFSM5H4K2SZQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you were mentioned.Message ID: @.***>
For anyone who reaches here,
The main problem of this issue would be caused by the assay library file itself, and I think this should be fixed by users ourselves, but not a issue for developers. So I would like to close this issue.
If you meet duplicated transition ID error, please check: only one unique gene and one unique protein was assigned to each peptide, but not two or more different ones appear in different rows
It seems this issue still persists
Part of the issue could come from the SQL select query here: https://github.com/OpenMS/OpenMS/blob/develop/src/openms/source/ANALYSIS/OPENSWATH/TransitionPQPFile.cpp which could lead to duplicated entries when you have 1:n mappings of peptides to proteins / genes. We should address this
2) we should also address the issue of decoy peptides with the same sequence
Hi @hroest ,
I'm using OpenSwath to analyze DIA data acquired from QEHF, and an error occurred when I ran OSW with generated pqp file. The error was caused by some decoy peptides which got same sequence from different target peptide sequences (also belonged to different genes), and this lead to duplicated transition IDs at the pqp reading step for aggregation of gene table
If the input mzml was converted from thermo raw file by msconvert without peak picking, there will be no exception raised, and just stopped when searching, like this
When the input mzml was converted with peak picking, the error will be invalid ID
If I use TargetedFileConverter again, from pqp to tsv, the error will be raised correctly, in the checking step after reading database and generating TargetedExperiment
The file attched below (extracted from pqp file) is all transitions with duplicated IDs after running DecoyGenerator. example_of_same_seq_in_diff_genes.txt Two kinds of decoy peptides with same sequence: Peptide
FVQDLSK
belongs toQ91ZJ5;DECOY_P52196
, in which DECOY_P52196 has original proteinID P52196 with a peptideFQLVDSR
, gene name of these two is Ugp2 and Tst PeptideYLDLLQK
belongs to protein groupDECOY_Q0KK55;DECOY_Q6PHN7
, and the original sequence is YLLDLLR and YLLQLLR, with one AA difference, belongs to Tmem164 and Kndc1 respectively (after shuffle, protein are combined but gene are individually kept)Currently I directly dropped decoy peptides which have same sequences as targets and same decoy peptide sequence belong to different genes, when assay file was still in tsv format before converting to pqp and it worked fine now
Maybe this case is rare since it needs both genes assgined and same sequences from randomly generated decoy peptides I'd suggest an optional parameter to control if the decoys are allowed as same as target ones, or just filter them out. And a checking step for pqp file in OSW will be great, like that in TargetedFileConverter, to find some invalid items before next step.
Best regards, Ronghui