Nesvilab / TMT-Integrator

A tool integrates channel abundances from multiple TMT samples and exports a general report for downstream analysis.
http://tmt-integrator.nesvilab.org
12 stars 1 forks source link

TMTintegrator cannot find "custom contaminant" and stops analysis #22

Open jjGG opened 2 years ago

jjGG commented 2 years ago

Dear Developers,

I would like to understand why TMTintegrator has issues with our list of contaminants that we usually concatenating to all our databases. https://fgcz-proteomics.uzh.ch/fasta/fgcz_contaminants2022_20220511.fasta

If I do use the "download fasta" option, all is working fine and I do get the TMTintergator reports. If I would like to use my own fasta file (that is a proper uniprot database and I concatenate it with our fgcz_contaminants2022 I get FragPipe running till the TMTintegrator step (v3.3.3) and then it throws the error:

TMT-Integrator v3.3.3
UpdateColumns--- 0.14370 min.
Start to process GroupBy=0
Error: could not find protein ID zz|Y-FGCZCont00298| P41361 SWISS-PROT:P41361 (Bos taurus) Antithrombin-III precursor in database. Stopping analysis
Process 'TmtIntegrator' finished, exit code: 0

If I do delete this particular protein in my fasta file it throws an error at the next identified FGCZcontaminant.

All other reports are fine and there. I already tried several things with "adapting" the header or the accession or our contaminants but I still always get this error from TMTintegrator.

Can you see what TMTintegartor does not like on our fasta headers?

Best regards

jonas

prvst commented 2 years ago

Hi @jjGG. Different software from our pipeline relies on using pattern finding to locate things like the protein ID, the gene symbol, and the description. The main reason we ask people to use known patterns from NCBI or UniProt is because they have known and well-documented formats, so it's easy for us to know where the information is. Having a custom pattern might make things difficult for this software, because we don't have a way to predict how you write those information, especially if you are mixing up the position of certain elements or reorganizing them. I'm not sure by just looking at your header, what the problem is, but you'll likely have issues with other headers too, and even worse, you might be susceptible to silent errors that might happen if one of the software mix it up two or more FASTA entries. So even though we can track the problem and let you fix it, I strongly suggest that you adapt your FASTA file, using a common format, like the one UniProt has (https://www.uniprot.org/help/fasta-headers).

jjGG commented 2 years ago

Hello @prvst Thanks for your swift reply. I partially agree that things are easier (from a software development point of view especially) that things are standardised or harmonised.

On the other hand - as you may know - there are so many different resources for AA-fastas that this is really difficult to get all on the same page. Also uniprot is (depending on the organism) NOT always the best resource to start with (e.g. Arabidopsis and also other model organisms like Flybase and Wormpeps..). Researchers in these areas usually are much more familiar with their community resource. Also coming from a sequencing experiment you may generate AA-sequences and search these and then you will usually NOT have uniprot headers. Or- in other cases you want a "custom-sequence to be added to a uniprot fasta because you have a recombinant protein in your extracts. In my case - it is important that we can have our contaminant list in our local uniprot databases and we even try to have the "accession" relatively close to a uniprot-accession (e.g. zz|Y-FGCZCont0001|Name). We are aware and I think this is really critical thing that sometimes you may even "loose" a protein if the fasta-header is not formatted the right way (without any errors or something) e.g. we had issues with previous versions if the proteinAccession was just alone on the headerline (>MyProteinX\r) the protein would not show up in any result file! (took us a while to figure this out).

Meanwhile (after quite some testing!) I do have the TMTintegrator results again. My last change to the fasta-db was a replacement of zz|FGCZ... -> sp|FGCZ...

I assume that the TMTintegrator wants it in this form! (could someone confirm this? and think about if this shall be kept? Since all tools before were "less sensitive"?

Best regards jonas

huiyinc commented 2 years ago

Hi Jonas,

Thanks for your efforts in adapting the fasta files for TMT-Integrator. We have received some users' feedback regarding the customized fasta header issue, and have updated TMT-Integrator to resolve the issue. Could you please try TMT-Integrator v4.0.2 (link https://github.com/Nesvilab/TMT-Integrator/releases/tag/4.0.2) and see if it solves the problem?

Best,

Huiyin

Grossmann @.***> 於 2022年9月23日 週五 晚上11:40寫道:

Hello @prvst https://github.com/prvst Thanks for your swift reply. I partially agree that things are easier (from a software development point of view especially) that things are standardised or harmonised.

On the other hand - as you may know - there are so many different resources for AA-fastas that this is really difficult to get all on the same page. Also uniprot is (depending on the organism) NOT always the best resource to start with (e.g. Arabidopsis and also other model organisms like Flybase and Wormpeps..). Researchers in these areas usually are much more familiar with their community resource. Also coming from a sequencing experiment you may generate AA-sequences and search these and then you will usually NOT have uniprot headers. Or- in other cases you want a "custom-sequence to be added to a uniprot fasta because you have a recombinant protein in your extracts. In my case - it is important that we can have our contaminant list in our local uniprot databases and we even try to have the "accession" relatively close to a uniprot-accession (e.g. zz|Y-FGCZCont0001|Name). We are aware and I think this is really critical thing that sometimes you may even "loose" a protein if the fasta-header is not formatted the right way (without any errors or something) e.g. we had issues with previous versions if the proteinAccession was just alone on the headerline (>MyProteinX\r) the protein would not show up in any result file! (took us a while to figure this out).

Meanwhile (after quite some testing!) I do have the TMTintegrator results again. My last change to the fasta-db was a replacement of zz|FGCZ... -> sp|FGCZ...

I assume that the TMTintegrator wants it in this form! (could someone confirm this? and think about if this shall be kept? Since all tools before were "less sensitive"?

Best regards jonas

— Reply to this email directly, view it on GitHub https://github.com/Nesvilab/TMT-Integrator/issues/22#issuecomment-1256372997, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALAWWA36OPWEZ3RYBO7QSX3V7XFPRANCNFSM6AAAAAAQUA7IFI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Hui-Yin Chang, 張彙音 Assistant Professor Department of Biomedical Sciences and Engineering National Central University, Taiwan

jjGG commented 2 years ago

Hello Huiyin,

Thanks a lot for your email. I confirm that this new TMTintegrator 4.0.2 is working with our custom contaminants headers and I get all the expected reports and outputs. A quick check shows that I do have almost 20% fewer phopho-peptides (in only checked the abundance_single-site_MD.tsv). I will do another quick check to see if I really did not mess up on the parameter settings but I dont think so. Is there anything in the "new" TMTintegrator different with respect to filtering or stringency?

Best regards jonas

jjGG commented 2 years ago

Hello Huiyin,

I double checked - all fragger.params and TMT-integrator-conf.yml are identical (apart from the fasta). Any idea where the discrepancy is coming from? 20% is quite a difference.

best regards jonas

huiyinc commented 2 years ago

Hi Jonas,

Can you please send me the ratio_single-site_MD files generated by TMT-Integrator v3.3.3 and v4.0.2? I will check the files and reply to you as soon as I figure out the reason. Thanks.

Huiyin

Grossmann @.***> 於 2022年9月27日 下午5:42 寫道:



Hello Huiyin,

I double checked - all fragger.params and TMT-integrator-conf.yml are identical (apart from the fasta). Any idea where the discrepancy is coming from? 20% is quite a difference.

best regards jonas

— Reply to this email directly, view it on GitHub https://github.com/Nesvilab/TMT-Integrator/issues/22#issuecomment-1259252113, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALAWWA7J4AZGNECDCSYKV33WAK6RVANCNFSM6AAAAAAQUA7IFI . You are receiving this because you commented.Message ID: @.***>

jjGG commented 2 years ago

Hello Huiyin,

Please find attached a zip file with fragger.params, TMT-conf-yaml and two TMTintegrator outputs. best regards & thanks for looking into this. jonas troubleshoot_forHuiyin.zip

huiyinc commented 2 years ago

Hi Jonas,

According to your parameter files, two different fasta files were used (fgcz_9606_reviewed_cnl_d_20220429.fasta and mod_fgcz_9606_reviewed_cnl_d_20220429.fasta). So, it is expected that the PSM tables and TMT-Integrator reports might be different. I think you might have to first check the PSM tables. Can you please tell me what the differences between the two fasta files are? Thanks.

Huiyin

Grossmann @.***> 於 2022年9月27日 下午7:15 寫道:



Hello Huiyin,

Please find attached a zip file with fragger.params, TMT-conf-yaml and two TMTintegrator outputs. best regards & thanks for looking into this. jonas troubleshoot_forHuiyin.zip https://github.com/Nesvilab/TMT-Integrator/files/9654866/troubleshoot_forHuiyin.zip

— Reply to this email directly, view it on GitHub https://github.com/Nesvilab/TMT-Integrator/issues/22#issuecomment-1259350073, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALAWWA3EHMCM4K2YMVWOHT3WALJMNANCNFSM6AAAAAAQUA7IFI . You are receiving this because you commented.Message ID: @.***>

jjGG commented 2 years ago

Hei Huiyin,

Yes - correct. The "mod_fgcz..." is the one fasta-file where I tried to make some changes on our list of contaminant proteins sequences. In one attempt, there is one single "previously identified" protein deleted from this fasta. All the rest are changes on the Accession or Description lines. All the human-uniprot entries are identical.

I assume this cannot make the up to 20% difference on "all levels". Where shall I check the psm-tables?
Any other idea what I could test?

best regards jonas

////////////////// Hi Jonas,

According to your parameter files, two different fasta files were used (fgcz_9606_reviewed_cnl_d_20220429.fasta and mod_fgcz_9606_reviewed_cnl_d_20220429.fasta). So, it is expected that the PSM tables and TMT-Integrator reports might be different. I think you might have to first check the PSM tables. Can you please tell me what the differences between the two fasta files are? Thanks.

Huiyin

anesvi commented 2 years ago

Yes please check PSM.tsv to see that the input to TMT-I is the same ( maybe philosopher removed some peptides before that? Or not in the database?)

Get Outlook for iOShttps://aka.ms/o0ukef


From: chuiyin @.> Sent: Wednesday, September 28, 2022 3:14:55 AM To: Nesvilab/TMT-Integrator @.> Cc: Subscribed @.***> Subject: Re: [Nesvilab/TMT-Integrator] TMTintegrator cannot find "custom contaminant" and stops analysis (Issue #22)

External Email - Use Caution

Hi Jonas,

According to your parameter files, two different fasta files were used (fgcz_9606_reviewed_cnl_d_20220429.fasta and mod_fgcz_9606_reviewed_cnl_d_20220429.fasta). So, it is expected that the PSM tables and TMT-Integrator reports might be different. I think you might have to first check the PSM tables. Can you please tell me what the differences between the two fasta files are? Thanks.

Huiyin

Grossmann @.***> 於 2022年9月27日 下午7:15 寫道:



Hello Huiyin,

Please find attached a zip file with fragger.params, TMT-conf-yaml and two TMTintegrator outputs. best regards & thanks for looking into this. jonas troubleshoot_forHuiyin.zip https://github.com/Nesvilab/TMT-Integrator/files/9654866/troubleshoot_forHuiyin.zip

— Reply to this email directly, view it on GitHub https://github.com/Nesvilab/TMT-Integrator/issues/22#issuecomment-1259350073, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALAWWA3EHMCM4K2YMVWOHT3WALJMNANCNFSM6AAAAAAQUA7IFI . You are receiving this because you commented.Message ID: @.***>

— Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/TMT-Integrator/issues/22#issuecomment-1260485911, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM64R5KXVAX234N4RBVTWAPV67ANCNFSM6AAAAAAQUA7IFI. You are receiving this because you are subscribed to this thread.Message ID: @.***>


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

jjGG commented 2 years ago

Hello FragPipe Team and Alexey,

I see there is already a difference in the PSM.tsv tables. So I assume that this is less a TMT-Integrator issue than a general "philosopher" issue?

I would expect/accept a "small" difference because in one fasta one of the identified contaminants is deleted (now w/ new TMT-I identified by 2 psms).

But I do see quite different number of lines in psm.tsv (FP18, TMT-I 3.3.3 = 53677 psms vs FP18, TMT-I 3.3.4 47050 psms) -> this is not 20% anymore but more 10% but still quite a difference and unclear to me how this happens by only changing the fasta headers. -> one explanation that I would see is that Contaminants (previously (in old-TMT-I) labeled as sp|FGCZCont..| are taken into account for fdr-filtering and/or mass correction (as they may look like regular proteins from the organism under investigation) while if labeled as zz|FGCZCont..| they are filtered out for fdr-filtering and/or mass filtering and therefore "thresholds" might be changed?

@anesvi: there is only 1 protein deleted in the database where all zz|FGCZCont are labeled as sp|FGCZCont. The one protein that is missing is only identified w/ 2 psms. Why should philosopher remove so many peptides?

best regards jonas

prvst commented 2 years ago

When inspecting the two log files from your runs, I noticed that they have different number of identifications . This is from files before the filtering:

oldTMTintegrator_FP18

INFO[17:10:32] 1+ Charge profile                             decoy=0 target=0
INFO[17:10:32] 2+ Charge profile                             decoy=778 target=13116
INFO[17:10:32] 3+ Charge profile                             decoy=1863 target=29261
INFO[17:10:32] 4+ Charge profile                             decoy=1685 target=13825
INFO[17:10:32] 5+ Charge profile                             decoy=883 target=3477
INFO[17:10:32] 6+ Charge profile                             decoy=284 target=623
INFO[17:10:32] Database search results                       ions=31965 peptides=21594 psms=65795

TMTintegrator402_FP18

INFO[10:31:31] 1+ Charge profile                             decoy=0 target=0
INFO[10:31:31] 2+ Charge profile                             decoy=256 target=10605
INFO[10:31:31] 3+ Charge profile                             decoy=396 target=24275
INFO[10:31:31] 4+ Charge profile                             decoy=207 target=10802
INFO[10:31:31] 5+ Charge profile                             decoy=76 target=2280
INFO[10:31:31] 6+ Charge profile                             decoy=30 target=338
INFO[10:31:32] Database search results                       ions=21837 peptides=13391 psms=49265

Please correct me if I'm wrong, but I had the impression from your details above that the only difference was one protein that you removed. Could you confirm that you are using the same parameters or input files?

anesvi commented 2 years ago

Sorry, you already sent log files

Felipe please forward the logs to me and Fengchao to take a look too

From: Felipe da Veiga Leprevost @.> Sent: Wednesday, September 28, 2022 9:23 AM To: Nesvilab/TMT-Integrator @.> Cc: Nesvizhskii, Alexey @.>; Mention @.> Subject: Re: [Nesvilab/TMT-Integrator] TMTintegrator cannot find "custom contaminant" and stops analysis (Issue #22)

External Email - Use Caution

When inspecting the two log files from your runs, I noticed that they have different number of identifications . This is from files before the filtering:

oldTMTintegrator_FP18

INFO[17:10:32] 1+ Charge profile decoy=0 target=0

INFO[17:10:32] 2+ Charge profile decoy=778 target=13116

INFO[17:10:32] 3+ Charge profile decoy=1863 target=29261

INFO[17:10:32] 4+ Charge profile decoy=1685 target=13825

INFO[17:10:32] 5+ Charge profile decoy=883 target=3477

INFO[17:10:32] 6+ Charge profile decoy=284 target=623

INFO[17:10:32] Database search results ions=31965 peptides=21594 psms=65795

TMTintegrator402_FP18

INFO[10:31:31] 1+ Charge profile decoy=0 target=0

INFO[10:31:31] 2+ Charge profile decoy=256 target=10605

INFO[10:31:31] 3+ Charge profile decoy=396 target=24275

INFO[10:31:31] 4+ Charge profile decoy=207 target=10802

INFO[10:31:31] 5+ Charge profile decoy=76 target=2280

INFO[10:31:31] 6+ Charge profile decoy=30 target=338

INFO[10:31:32] Database search results ions=21837 peptides=13391 psms=49265

Please correct me if I'm wrong, but I had the impression from your details above that the only difference was one protein that you removed. Could you confirm that you are using the same parameters or input files?

— Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/TMT-Integrator/issues/22#issuecomment-1260909401, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM65S6VYLZFPRX4HKOTLWARBEJANCNFSM6AAAAAAQUA7IFI. You are receiving this because you were mentioned.Message ID: @.**@.>>


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

prvst commented 2 years ago

@anesvi see here https://github.com/Nesvilab/TMT-Integrator/issues/22#issuecomment-1259350073

jjGG commented 2 years ago

Hello Felipe,

Yes - I confirm that I did (at least I wanted) to use identical parameters and same input files.

I wanted to load the TMT-16-phospho workflow, adjusted the fasta, changed in "QuantIsobaric" to TMT-18 method and loaded my annotation file and redirected the output to a new folder!

Meanwhile I did another test with the downloaded uniprot again (w/ the new TMT-I) and I do get another different number of PSMs. I am even more confused now. My next test is a rerun with the "old" modified fasta-file to see if I get the "high numbers" back!

My only explanation would be that in the case where I label my Contaminants as sp| they are maybe suddenly included in the mass calibration and decoy filtering step and by this lowering the acceptance thresholds?

best regards - jonas

anesvi commented 2 years ago

Hi Jonas, Any update on this on your side? I would be surprised if removing one protein changed the mass calibration/decoy filter enough to give a big difference in the number of PSMs. Best, Alexey

From: Grossmann @.> Sent: Wednesday, September 28, 2022 9:37 AM To: Nesvilab/TMT-Integrator @.> Cc: Nesvizhskii, Alexey @.>; Mention @.> Subject: Re: [Nesvilab/TMT-Integrator] TMTintegrator cannot find "custom contaminant" and stops analysis (Issue #22)

External Email - Use Caution

Hello Felipe,

Yes - I confirm that I did (at least I wanted) to use identical parameters and same input files.

I wanted to load the TMT-16-phospho workflow, adjusted the fasta, changed in "QuantIsobaric" to TMT-18 method and loaded my annotation file and redirected the output to a new folder!

Meanwhile I did another test with the downloaded uniprot again (w/ the new TMT-I) and I do get another different number of PSMs. I am even more confused now. My next test is a rerun with the "old" modified fasta-file to see if I get the "high numbers" back!

My only explanation would be that in the case where I label my Contaminants as sp| they are maybe suddenly included in the mass calibration and decoy filtering step and by this lowering the acceptance thresholds?

best regards - jonas

— Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/TMT-Integrator/issues/22#issuecomment-1260928012, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM62Y76PO3LMTVGBJASLWARCXFANCNFSM6AAAAAAQUA7IFI. You are receiving this because you were mentioned.Message ID: @.**@.>>


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

anesvi commented 2 years ago

Need to assign to philosopher/Felipe to investigate

Get Outlook for iOShttps://aka.ms/o0ukef


From: Grossmann @.> Sent: Wednesday, September 28, 2022 7:53:40 AM To: Nesvilab/TMT-Integrator @.> Cc: Nesvizhskii, Alexey @.>; Mention @.> Subject: Re: [Nesvilab/TMT-Integrator] TMTintegrator cannot find "custom contaminant" and stops analysis (Issue #22)

External Email - Use Caution

Hello FragPipe Team and Alexey,

I see there is already a difference in the PSM.tsv tables. So I assume that this is less a TMT-Integrator issue than a general "philosopher" issue?

I would expect/accept a "small" difference because in one fasta one of the identified contaminants is deleted (now w/ new TMT-I identified by 2 psms).

But I do see quite different number of lines in psm.tsv (FP18, TMT-I 3.3.3 = 53677 psms vs FP18, TMT-I 3.3.4 47050 psms) -> this is not 20% anymore but more 10% but still quite a difference and unclear to me how this happens by only changing the fasta headers. -> one explanation that I would see is that Contaminants (previously (in old-TMT-I) labeled as sp|FGCZCont..| are taken into account for fdr-filtering and/or mass correction (as they may look like regular proteins from the organism under investigation) while if labeled as zz|FGCZCont..| they are filtered out for fdr-filtering and/or mass filtering and therefore "thresholds" might be changed?

@anesvihttps://github.com/anesvi: there is only 1 protein deleted in the database where all zz|FGCZCont are labeled as sp|FGCZCont. The one protein that is missing is only identified w/ 2 psms. Why should philosopher remove so many peptides?

best regards jonas

— Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/TMT-Integrator/issues/22#issuecomment-1260792869, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM6YP3WD24Q5R5UBST5LWAQWUJANCNFSM6AAAAAAQUA7IFI. You are receiving this because you were mentioned.Message ID: @.***>


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

anesvi commented 2 years ago

Yes, removal of one protein should not change things

You can find examples of PSMs that are missing and see why. Sending the log files for both searches would help

From: Felipe da Veiga Leprevost @.> Sent: Wednesday, September 28, 2022 9:23 AM To: Nesvilab/TMT-Integrator @.> Cc: Nesvizhskii, Alexey @.>; Mention @.> Subject: Re: [Nesvilab/TMT-Integrator] TMTintegrator cannot find "custom contaminant" and stops analysis (Issue #22)

External Email - Use Caution

When inspecting the two log files from your runs, I noticed that they have different number of identifications . This is from files before the filtering:

oldTMTintegrator_FP18

INFO[17:10:32] 1+ Charge profile decoy=0 target=0

INFO[17:10:32] 2+ Charge profile decoy=778 target=13116

INFO[17:10:32] 3+ Charge profile decoy=1863 target=29261

INFO[17:10:32] 4+ Charge profile decoy=1685 target=13825

INFO[17:10:32] 5+ Charge profile decoy=883 target=3477

INFO[17:10:32] 6+ Charge profile decoy=284 target=623

INFO[17:10:32] Database search results ions=31965 peptides=21594 psms=65795

TMTintegrator402_FP18

INFO[10:31:31] 1+ Charge profile decoy=0 target=0

INFO[10:31:31] 2+ Charge profile decoy=256 target=10605

INFO[10:31:31] 3+ Charge profile decoy=396 target=24275

INFO[10:31:31] 4+ Charge profile decoy=207 target=10802

INFO[10:31:31] 5+ Charge profile decoy=76 target=2280

INFO[10:31:31] 6+ Charge profile decoy=30 target=338

INFO[10:31:32] Database search results ions=21837 peptides=13391 psms=49265

Please correct me if I'm wrong, but I had the impression from your details above that the only difference was one protein that you removed. Could you confirm that you are using the same parameters or input files?

— Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/TMT-Integrator/issues/22#issuecomment-1260909401, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM65S6VYLZFPRX4HKOTLWARBEJANCNFSM6AAAAAAQUA7IFI. You are receiving this because you were mentioned.Message ID: @.**@.>>


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

huiyinc commented 2 years ago

Hi Jonas,

One possible solution is to check the number of PSMs in the two PSM tables. I don't know how different they are. Or maybe you can send me the files, and I can take a quick look for you.

Best,

Huiyin

Grossmann @.***> 於 2022年9月28日 週三 下午3:37寫道:

Hei Huiyin,

Yes - correct. The "mod_fgcz..." is the one fasta-file where I tried to make some changes on our list of contaminant proteins sequences. In one attempt, there is one single "previously identified" protein deleted from this fasta. All the rest are changes on the Accession or Description lines. All the human-uniprot entries are identical.

I assume this cannot make the up to 20% difference on "all levels". Where shall I check the psm-tables? Any other idea what I could test?

best regards jonas

////////////////// Hi Jonas,

According to your parameter files, two different fasta files were used (fgcz_9606_reviewed_cnl_d_20220429.fasta and mod_fgcz_9606_reviewed_cnl_d_20220429.fasta). So, it is expected that the PSM tables and TMT-Integrator reports might be different. I think you might have to first check the PSM tables. Can you please tell me what the differences between the two fasta files are? Thanks.

Huiyin

— Reply to this email directly, view it on GitHub https://github.com/Nesvilab/TMT-Integrator/issues/22#issuecomment-1260508366, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALAWWAYBKD7BXQ2KZIUGQIDWAPYT3ANCNFSM6AAAAAAQUA7IFI . You are receiving this because you commented.Message ID: @.***>

-- Hui-Yin Chang, 張彙音 Assistant Professor Department of Biomedical Sciences and Engineering National Central University, Taiwan

jjGG commented 2 years ago

Hello everyone,

Thanks a lot for coming back to me on this. I am still pretty puzzled. Find attached a zip with the 2 psm-tables and the fragger.params files as well as the 2 fasta files that I used.

So here are again the differences between the 2 searches:

fgcz9606_newTMTi:

-> this means we indicate our contamiants with zz|Y-FGCZContanyNumber| bla bla text -> also we usually use REV for decoy proteins!

modifiedDB (or ModifiedDB):

My only explanation would be: somehow my zz-proteins are taken differently and probably not used for decoy filtering! --> while if I have the sp-label in front, my contaminants (which some of course are identified) change the fdr filters in such a way that suddently we have much more accepted psms.

do you have another idea?

best regards jonas

huiyinc commented 1 year ago

Hi Jonas,

Is the issue solved? Thanks.

Huiyin

jjGG commented 1 year ago

Hello Huiyin,

Thanks again for coming back to me and having a look at it. While TMT-integrator is now successfully running and also quantifying my reporter channels - there is still an issue when it comes to the number of IDs. (see above) The issue, that the number of identified proteins (and of course psms) is still quite different depending on the "accessions" of my contaminant proteins.

I am not sure how this is possible. My only explanation would be: somehow my zz-contamiant-proteins are taken differently and probably not used for decoy filtering! --> while if I have the sp-label in front, my contaminants (which some of course are identified) change the fdr filters in such a way that suddently we have much more accepted psms.

do you have another idea?

anesvi commented 1 year ago

Since FDR filtering is done by Philosopher this must be Philosopher filter related, not TMT-Integrator.

Felipe, can you check the previous discussion and see if we can understand this “somehow my zz-contamiant-proteins are taken differently and probably not used for decoy filtering!”

Thanks Alexey

From: Grossmann @.> Sent: Thursday, January 12, 2023 4:10 AM To: Nesvilab/TMT-Integrator @.> Cc: Nesvizhskii, Alexey @.>; Mention @.> Subject: Re: [Nesvilab/TMT-Integrator] TMTintegrator cannot find "custom contaminant" and stops analysis (Issue #22)

External Email - Use Caution

Hello Huiyin,

Thanks again for coming back to me and having a look at it. While TMT-integrator is now successfully running and also quantifying my reporter channels - there is still an issue when it comes to the number of IDs. (see above) The issue, that the number of identified proteins (and of course psms) is still quite different depending on the "accessions" of my contaminant proteins.

I am not sure how this is possible. My only explanation would be: somehow my zz-contamiant-proteins are taken differently and probably not used for decoy filtering! --> while if I have the sp-label in front, my contaminants (which some of course are identified) change the fdr filters in such a way that suddently we have much more accepted psms.

do you have another idea?

— Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/TMT-Integrator/issues/22#issuecomment-1380015935, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM67AIOJED552FIFYKJLWR7C5JANCNFSM6AAAAAAQUA7IFI. You are receiving this because you were mentioned.Message ID: @.**@.>>


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

prvst commented 1 year ago

Changing sp| to zz| causes differences in numbers because if the PSM maps to a decoy protein, and has an alternative protein classified as reviewed, the program will swap their position. When you change sp to zz, you prevent that from happening. I also suggest tagging your contaminants with contam_ because it goes with what we do here.

anesvi commented 1 year ago

“Changing sp| to zz| causes differences in numbers because if the PSM maps to a decoy protein, and has an alternative protein classified as reviewed, the program will swap their position. When you change sp to zz, you prevent that from happening.”

Felipe, how often do you think it would happen? I think just a few PSMs in the dataset. I think Jonas is getting a big difference, not just a few PSMs

From: Felipe da Veiga Leprevost @.> Sent: Thursday, January 12, 2023 12:39 PM To: Nesvilab/TMT-Integrator @.> Cc: Nesvizhskii, Alexey @.>; Mention @.> Subject: Re: [Nesvilab/TMT-Integrator] TMTintegrator cannot find "custom contaminant" and stops analysis (Issue #22)

External Email - Use Caution

Changing sp| to zz| causes differences in numbers because if the PSM maps to a decoy protein, and has an alternative protein classified as reviewed, the program will swap their position. When you change sp to zz, you prevent that from happening. I also suggest tagging your contaminants with contam_ because it goes with what we do here.

— Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/TMT-Integrator/issues/22#issuecomment-1380770998, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM66F73H5WYAGK55KIUTWSA6THANCNFSM6AAAAAAQUA7IFI. You are receiving this because you were mentioned.Message ID: @.**@.>>


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

prvst commented 1 year ago

Since the program checks every PSM, it really depends on how many targets you have as an alternative to decoys. It will also reflect this in the number of PSMs because the program will only keep PSMs with identified proteins, which can range from anything between 1 PSM to potentially thousands.