Nesvilab / IonQuant

A label free quantification tool.
Other
15 stars 8 forks source link

Missing intensities and incorrect 'combined' compilation #19

Closed m-a-valach closed 3 years ago

m-a-valach commented 3 years ago

I have run philosopher (in the pipeline mode) and have the folder structure as required. There seem to be two independent problems:

  1. At the level of protein.tsv, peptide.tsv, etc., in the individual directories, some proteins have multiple peptide ions (i.e., values >0 in Total Peptide Ions, etc.), but no associated spectral counts nor intensities.
  2. Even for protein entries, for which all seems to be OK at the level of protein.tsv, peptide.tsv, etc., in the individual directories, IonQuant seems to ignore Intensity columns (but not Spectral count columns) of most entries when compiling the 'combined_protein.tsv', etc. (i.e., the output of the --multidir option). I can generate a combined output myself by merging the tables, but it would naturally be preferable if IonQuant did it (correctly).
fcyu commented 3 years ago
  1. The proteins in protein.tsv are from all experiments. Thus, there are proteins not in the current experiment so you will see 0 spectral counts and intensities.
  2. What is the meaning of "IonQuant seems to ignore Intensity columns (but not Spectral count columns) of most entries when compiling the 'combined_protein.tsv'"? Could you elaborate more?

Best,

Fengchao

m-a-valach commented 3 years ago
  1. Thanks a lot for the clarification.
  2. Let's say that I have a bait protein 'BAIT' in an AP-MS experiment, so in the folder BAIT_1, protein.tsv is expected to contain a non-zero value in any Intensity column. This is the case, so no problem there. But in the 'combined_protein.tsv', this bait protein only has non-zero values in Spectral count columns, but zero in Intensity columns. Which should not be the case.
fcyu commented 3 years ago

It sounds like due to the parameter settings. Can you set --minions 1, --proteinquant 1, --minexps 1?

Best,

Fengchao

m-a-valach commented 3 years ago

Thanks for the tip. The top-N quantification seems to work: all intensities show up in the 'combined_protein.tsv'. Still, it would be great to have the option to use maxLFQ algorithm. I just noticed that in your paper on IonQuant (doi: 10.1074/mcp.tir120.002048), there is this explanation: "Each protein’s intensity is the summed intensity of top n ions identified in t percentage of all experiments, where n and t are parameters with default values of 3 and 50%, respectively" — if I understand it correctly, this is in the current version taken care off by the --minions 1, --proteinquant 1, --minexps 1 options. Does the same automatically apply to the maxLFQ algorithm (i.e., --proteinquant 2)? Or, in other words, do only ions that occur in several experiments get compiled into the table 'combined_protein.tsv'? This might explain my problem because the 'BAIT' that I mentioned previously was only detected in a single sample out of several that I ran in philosopher together.

fcyu commented 3 years ago

Still, it would be great to have the option to use maxLFQ algorithm.

You can use maxLFQ if your proteins exist in more than one experiment/sample.

I just noticed that in your paper on IonQuant (doi: 10.1074/mcp.tir120.002048), there is this explanation: "Each protein’s intensity is the summed intensity of top n ions identified in t percentage of all experiments, where n and t are parameters with default values of 3 and 50%, respectively" — if I understand it correctly, this is in the current version taken care off by the --minions 1, --proteinquant 1, --minexps 1 options

Setting --proteinquant 1 makes IonQuant use the top-N algorithm, which is the one you quoted. If using --proteinquant 2, the algorithm is different. Please check another paper (https://www.mcponline.org/article/S1535-9476(21)00050-5/fulltext) for details.

Or, in other words, do only ions that occur in several experiments get compiled into the table 'combined_protein.tsv'? This might explain my problem because the 'BAIT' that I mentioned previously was only detected in a single sample out of several that I ran in philosopher together.

MaxLFQ needs to calculate log-ratios for a peptide from any two experiments/samples. If a protein only existed in one experiment, there would be no log-ratio for it. Then, that protein would have 0 intensity. So, in your case, you should use top-N algorithm or increase the number of experiments/samples containing your bait protein.

Best,

Fengchao

m-a-valach commented 3 years ago

Thank you for all the explanations and suggestions. I am aware of the differences between the two algorithms, but did not realize that about maxLFQ... In any case, my issue seems to have been solved.

tobiasko commented 1 year ago

Hi @fcyu,

i also came across such a case and was wondering:

Would it possible to change the default behaviour of IonQuant with respect to such edge cases where log-ratios can't be calculated by the maxLFQ algorithm?

Currently, it seems that the implementation silently uses a zero as abundance estimation for protein x if it is only present in a single sample. Maybe a better approach would be to report a warning to str out/std error (not sure how you currently do this) and maybe use something like NA (not available) for the protein abundance matrix. That would help to spot these cases more easily. zero for me is always something like: The AUC calculation gave zero . But that is clearly not the case here.

I was also wondering if the default parameters for the topN calculation (minexp=2) are a good choice. Because using --maxlfq 0 will again result in a zero intensity according to the defaults. Only setting minexp=1 results in a non zero quant value. I at least run into this issue.

Best, Tobi

fcyu commented 1 year ago

Hi @tobiasko ,

Thanks for your suggestions.

maybe use something like NA (not available) for the protein abundance matrix. That would help to spot these cases more easily. zero for me is always something like: The AUC calculation gave zero . But that is clearly not the case here.

In the very first version of IonQuant (at that time, it was called IMQuant), it used <blank> not 0 for the proteins that failed in quantification. I also agree that it should not use 0 because zero means that the intensity is measured and equals zero. But, unfortunately, someone complained that they couldn't deal with <blank> in their tool and asked me to change it to 0. Since it has been out there for a while and users have already gotten used to it, I think we should not change it for now.

If you want to know which proteins have non-zero intensities but failed in MaxLFQ. You can check the Intensity column, which is from top-N algorithm, and the MaxLFQ Intensity column. For those proteins, you will see non-zero Intensity but zero MaxLFQ Intensity.

I was also wondering if the default parameters for the topN calculation (minexp=2) are a good choice. Because using --maxlfq 0 will again result in a zero intensity according to the defaults. Only setting minexp=1 results in a non zero quant value. I at least run into this issue.

Yes, this is a good point. In the latest FragPipe, it always uses --minexp 1. The default settings also have --tp 0 --minfreq 0. I will change the default values in the IonQuant standalone version.

Best,

Fengchao

tobiasko commented 1 year ago

Thanks for your fast reply! I can't really understand why people complain about such things, but I guess this boils down to the inability to parse text output properly. Thanks for changing the defaults in the future. We will set our input parameters accordingly to prevent this problem.

tobiasko commented 1 year ago

But what about a warning message to std error if log ratio calc. fails?

fcyu commented 1 year ago

Do you mean printing a warning message when there is a protein failed in MaxLFQ? There would be many warning message. Would it be to many?

Best,

Fengchao

tobiasko commented 1 year ago

Yes. I guess no, if these message go to the right channels. We always collect std.out and std.error in a text file.