Closed m-a-valach closed 3 years ago
Best,
Fengchao
BAIT_1
, protein.tsv is expected to contain a non-zero value in any Intensity
column. This is the case, so no problem there. But in the 'combined_protein.tsv', this bait protein only has non-zero values in Spectral count
columns, but zero in Intensity
columns. Which should not be the case.It sounds like due to the parameter settings. Can you set --minions 1
, --proteinquant 1
, --minexps 1
?
Best,
Fengchao
Thanks for the tip. The top-N quantification seems to work: all intensities show up in the 'combined_protein.tsv'. Still, it would be great to have the option to use maxLFQ algorithm.
I just noticed that in your paper on IonQuant (doi: 10.1074/mcp.tir120.002048), there is this explanation: "Each protein’s intensity is the summed intensity of top n ions identified in t percentage of all experiments, where n and t are parameters with default values of 3 and 50%, respectively" — if I understand it correctly, this is in the current version taken care off by the --minions 1
, --proteinquant 1
, --minexps 1
options. Does the same automatically apply to the maxLFQ algorithm (i.e., --proteinquant 2
)? Or, in other words, do only ions that occur in several experiments get compiled into the table 'combined_protein.tsv'? This might explain my problem because the 'BAIT' that I mentioned previously was only detected in a single sample out of several that I ran in philosopher
together.
Still, it would be great to have the option to use maxLFQ algorithm.
You can use maxLFQ if your proteins exist in more than one experiment/sample.
I just noticed that in your paper on IonQuant (doi: 10.1074/mcp.tir120.002048), there is this explanation: "Each protein’s intensity is the summed intensity of top n ions identified in t percentage of all experiments, where n and t are parameters with default values of 3 and 50%, respectively" — if I understand it correctly, this is in the current version taken care off by the --minions 1, --proteinquant 1, --minexps 1 options
Setting --proteinquant 1
makes IonQuant use the top-N algorithm, which is the one you quoted. If using --proteinquant 2
, the algorithm is different. Please check another paper (https://www.mcponline.org/article/S1535-9476(21)00050-5/fulltext) for details.
Or, in other words, do only ions that occur in several experiments get compiled into the table 'combined_protein.tsv'? This might explain my problem because the 'BAIT' that I mentioned previously was only detected in a single sample out of several that I ran in philosopher together.
MaxLFQ needs to calculate log-ratios for a peptide from any two experiments/samples. If a protein only existed in one experiment, there would be no log-ratio for it. Then, that protein would have 0 intensity. So, in your case, you should use top-N algorithm or increase the number of experiments/samples containing your bait protein.
Best,
Fengchao
Thank you for all the explanations and suggestions. I am aware of the differences between the two algorithms, but did not realize that about maxLFQ... In any case, my issue seems to have been solved.
Hi @fcyu,
i also came across such a case and was wondering:
Would it possible to change the default behaviour of IonQuant with respect to such edge cases where log-ratios can't be calculated by the maxLFQ algorithm?
Currently, it seems that the implementation silently uses a zero as abundance estimation for protein x if it is only present in a single sample. Maybe a better approach would be to report a warning to str out/std error (not sure how you currently do this) and maybe use something like NA (not available) for the protein abundance matrix. That would help to spot these cases more easily. zero for me is always something like: The AUC calculation gave zero . But that is clearly not the case here.
I was also wondering if the default parameters for the topN calculation (minexp=2) are a good choice. Because using --maxlfq 0 will again result in a zero intensity according to the defaults. Only setting minexp=1 results in a non zero quant value. I at least run into this issue.
Best, Tobi
Hi @tobiasko ,
Thanks for your suggestions.
maybe use something like NA (not available) for the protein abundance matrix. That would help to spot these cases more easily. zero for me is always something like: The AUC calculation gave zero . But that is clearly not the case here.
In the very first version of IonQuant (at that time, it was called IMQuant), it used <blank>
not 0
for the proteins that failed in quantification. I also agree that it should not use 0
because zero means that the intensity is measured and equals zero. But, unfortunately, someone complained that they couldn't deal with <blank>
in their tool and asked me to change it to 0
. Since it has been out there for a while and users have already gotten used to it, I think we should not change it for now.
If you want to know which proteins have non-zero intensities but failed in MaxLFQ. You can check the Intensity
column, which is from top-N algorithm, and the MaxLFQ Intensity
column. For those proteins, you will see non-zero Intensity
but zero MaxLFQ Intensity
.
I was also wondering if the default parameters for the topN calculation (minexp=2) are a good choice. Because using --maxlfq 0 will again result in a zero intensity according to the defaults. Only setting minexp=1 results in a non zero quant value. I at least run into this issue.
Yes, this is a good point. In the latest FragPipe, it always uses --minexp 1
. The default settings also have --tp 0 --minfreq 0
. I will change the default values in the IonQuant standalone version.
Best,
Fengchao
Thanks for your fast reply! I can't really understand why people complain about such things, but I guess this boils down to the inability to parse text output properly. Thanks for changing the defaults in the future. We will set our input parameters accordingly to prevent this problem.
But what about a warning message to std error if log ratio calc. fails?
Do you mean printing a warning message when there is a protein failed in MaxLFQ? There would be many warning message. Would it be to many?
Best,
Fengchao
Yes. I guess no, if these message go to the right channels. We always collect std.out and std.error in a text file.
I have run
philosopher
(in the pipeline mode) and have the folder structure as required. There seem to be two independent problems:Total Peptide Ions
, etc.), but no associated spectral counts nor intensities.IonQuant
seems to ignoreIntensity
columns (but notSpectral count
columns) of most entries when compiling the 'combined_protein.tsv', etc. (i.e., the output of the--multidir
option). I can generate a combined output myself by merging the tables, but it would naturally be preferable if IonQuant did it (correctly).