Nesvilab / IonQuant

A label free quantification tool.
Other
15 stars 8 forks source link

Feature request: Do not match ions with different CV when processing FAIMS data #42

Closed hollenstein closed 1 year ago

hollenstein commented 1 year ago

Dear developers,

Is your feature request related to a problem? The issue is that the ions with different compensation voltages (CV) are currently being paired in the combined_ion table, which is not optimal for FAIMS data analysis.

Description of the issue As I understood it the current workflow is the following. After match between runs ions are matched based on the modified sequence plus charge, then for each run the CV with the highest intensity is selected and this is reported in the combined_ion table.

This seems not ideal for Orbitrap with FAIMS, as there are only a few CVs that are "far apart". How an ion distributes into the different CVs is quite stable, at least on the same system. Combining the ion quantification from different CVs results in technical artefacts, which can be seen in the example below. The quantification precision of the ions with unequal CV is reduced drastically, and they make up about 20% of all ions, so I think it might be really worthwhile to look into this.

image _Here I've used the pre-release version of FragPipe 19.2 and processed two technical replicates of the same sample. I've used all entries from the combinedion table that had quantified values for both replicates. Then I split the data into entries that had the same Apex CV in both replicates or not. To ensure that there is no intensity dependent bias on the quantification precision I've had a look at the average intensity distribution of the two groups, which seems to be almost identical. The number of data points per group are: Equal CV ~38000, Unequal CV ~9000.

Additionally, from manually looking at some examples I have the feeling that MBR is not performed if the same peptide+charge has already been identified in all runs, even if the identification is from different CVs. But maybe I am wrong here, because from the published description I would have thought that MBR is performed independently for each CV?

From the paper "IonQuant Enables Accurate and Sensitive Label-Free Quantification With FDR-Controlled Match-Between-Runs": "IonQuant can automatically detect if the data were acquired using FAIMS. If FAIMS was used, IonQuant builds separate spectral indexes corresponding to each compensation voltage. Then, peak tracing, ion detection, and ion transfer are performed within each compensation voltage."

Suggested solution My suggested solution should be fairly simple to implement, consider Peptide+charge+CV for pairing ions between runs in the combined_ion table, i.e. make Peptide+charge+CV the unique ID for the table. And, if this is not already done, also use Peptide+charge+CV for deciding if an ID needs to be transferred to another run during the MBR.

Limitations This solution only makes sense for Orbitrap+FAIMS (or whenever one uses the ion mobility in a similar way), so there could be a switch button in the ion quant settings with an on/off/auto option, where auto detects if FAIMS data is being analyzed. When switched on, the CV is also used to determine the unique ID for the combined_ion table.

This solution might not be optimal if samples were analysed on multiple different LC-MS systems and then searched with FragPipe together, then it might be better to use peptide+charge as unique entries and select the apex cv intensity. Although in this situation using the MaxLFQ aggregation approach might anyways not be the best option.

Best regards and thanks for having a look at this suggestion, David Hollenstein

fcyu commented 1 year ago

Hi David,

Thank you very much for your investigation and suggestions. They are all very helpful. Please find my comments in the following.

Additionally, from manually looking at some examples I have the feeling that MBR is not performed if the same peptide+charge has already been identified in all runs, even if the identification is from different CVs. But maybe I am wrong here, because from the published description I would have thought that MBR is performed independently for each CV?

MBR treats peptide+charge+CV as a unique ID. peptide+charge with CV1 will not be transferred to another pepide+charge with CV2. If you find an example that MBR mistakenly transfer a peptide+charge+CV to a different CV, could you share the data with us? It must be a bug and we should fix it.

My suggested solution should be fairly simple to implement, consider Peptide+charge+CV for pairing ions between runs in the combined_ion table, i.e. make Peptide+charge+CV the unique ID for the table. And, if this is not already done, also use Peptide+charge+CV for deciding if an ID needs to be transferred to another run during the MBR.

Thanks for your suggestion. We actually had discussed about another solution: "split" an experiment into multiple experiment+CV. FAIMS basically is like fractionation, not like the ion mobility separation in timsTOF. We did not do that because 1) it would result in much more missing values; 2) it makes downstream analysis more difficult because people probably still need to "combine" columns from the same experiment. Then the question is how to combine to avoid the issue found in your previous plot? 3) it would make the combined_modified_peptide and combined_peptide complicated. If also need to do the same "splitting" for modified peptide and peptide?

"Split" peptide+charge into peptide+charge+CV is another solution. But there are still issue such as more missing values and hard for downstream analysis.

We will discuss internally, but any suggestions are welcome.

Best,

Fengchao

hollenstein commented 1 year ago

Hi Fengchao,

If you find an example that MBR mistakenly transfer a peptide+charge+CV to a different CV, could you share the data with us? It must be a bug and we should fix it.

No, I didn't look for this in the data. I was just speculating how the algorithm works.

"Split" peptide+charge into peptide+charge+CV is another solution. But there are still issue such as more missing values and hard for downstream analysis.

I had a look at several of the middle and high intensity ions with unequal CV. For the examples I looked at, I could extract very nice XICs, for both CVs and in both replicates. So I have the feeling it's not the absence of ion traces from the rawfiles that would cause lots of missing values when only pairing ions of the same CV.

I am not sure what the exact reason is why different CVs are reported in the ions tables. Maybe the quality of one of the ion features with the same CV did not pass the FDR threshold, or the IDs are not always transferred during MBR (although this seems unlikely), or there is a completely different reason.

Why do you think it would be hard for downstream analysis? I assume you would just treat it like the charge state. For the peptides and modified peptides table "Charges" are reported in one column, so you would possibly add "Compensation voltages" or charge + cv combination. Although I am not sure if that would really be necessary.

We will discuss internally, but any suggestions are welcome.

I had some ideas how to possibly improve the coverage, however, it depends on the actual cause of identical ions (z+CV) not being quantified in all runs. Since I don't know the details how ion quant operates and how it is implemented its hard to tell what would work. If you are interested, we could also have video call to discuss this. It might be a bit easier for me, as otherwise I need to write a lot of explanations which could turn out to be completely irrelevant because I didn't understand ion quant properly =)

We frequently perform quantitative analysis of PTMs for purified proteins at our facility. Currently, we are also switching to FragPipe for these kind of analysis. However, for PTMs the issue of the mismatched CVs can be much more severe than for proteins because often you only have one or two different ions for the quantification. Hence, I would also be happy to help with testing adjustments you make to ion quant, if it would be useful to for you.

Best, David

fcyu commented 1 year ago

I had a look at several of the middle and high intensity ions with unequal CV. For the examples I looked at, I could extract very nice XICs, for both CVs and in both replicates. So I have the feeling it's not the absence of ion traces from the rawfiles that would cause lots of missing values when only pairing ions of the same CV.

I am not sure what the exact reason is why different CVs are reported in the ions tables. Maybe the quality of one of the ion features with the same CV did not pass the FDR threshold, or the IDs are not always transferred during MBR (although this seems unlikely), or there is a completely different reason.

The reason could be from several steps. As long as there is an ion with another CV having better quality (e.g., mass error, RT error, intensity), IonQuant will pick that one.

Why do you think it would be hard for downstream analysis? I assume you would just treat it like the charge state. For the peptides and modified peptides table "Charges" are reported in one column, so you would possibly add "Compensation voltages" or charge + cv combination. Although I am not sure if that would really be necessary.

I put the ions with different CV to different rows. And then, what about modified peptides and peptides? If yes, then there will be redundant peptides in the table.

If you are interested, we could also have video call to discuss this. It might be a bit easier for me, as otherwise I need to write a lot of explanations which could turn out to be completely irrelevant because I didn't understand ion quant properly =)

Sure. Please contact my email yufe AT umich.edu

Best,

Fengchao

fcyu commented 1 year ago

Hi @hollenstein ,

I have implemented the new index for Ion (modified peptide + charge + CV). The results from your data look good Untitled-2

If you want to test the new pre-release version, please let me know.

Thanks again for your valuable testing and suggestions.

Best,

Fengchao

hollenstein commented 1 year ago

Hi Fengchao,

many thanks again for implementing this feature, and I am glad that I could help. I would indeed like to test the new release, altough it might take a bit until I find the time to run the tests I have in mind.

I believe I've told you that I've observed a better performance with this benchmark dataset when splitting the rawfiles into different CVs. I looked at the data again and the difference of significant proteins (LIMMA) is actually quite striking. I am curious to see if there is still a difference in performance with the new version of IonQuant.

Best, David

fcyu commented 1 year ago

Hi David,

Sounds good. I have send it to your email.

Best,

Fengchao