PNNL-Comp-Mass-Spec / Informed-Proteomics

Top down / bottom up, MS/MS analysis tool for DDA and DIA mass spectrometry data
29 stars 9 forks source link

Is there any way to batch export XIC data points(time & Intensity) of all PrSMs? #23

Closed wingkinlui closed 4 years ago

wingkinlui commented 4 years ago

I am currently trying to create some customized XICs which would need the retention time and intensities of all the precursors in all the PrSM. Is there any way to batch export these data? I know I can see the XIC of PrSMs through the LCMSSpectator, but it does not allow me to export the data points. Even if it does, I will have to do it one by one. I am thinking that the required information maybe stored in the pbf file, but I do not know how to parse it.

alchemistmatt commented 4 years ago

I have a hunch that MASIC will work for this: https://github.com/PNNL-Comp-Mass-Spec/MASIC/releases

MASIC has both a GUI program (MASIC.exe) and a console (command line) version (MASIC_Console.exe). I suggest you use the GUI to define the search options, including the custom list of m/z values to search for, then save a parameter file. If you need to batch search a bunch of data files, use the console version along with that parameter file.

The key settings you need to obtain what (I think) you're looking for are:

In the output directory, you'll find a file named DatasetName_SICdata.txt with data like this:

Dataset ParentIonIndex FragScanIndex ParentIonMZ Scan MZ Intensity
706504 422 0 649.85 1 649.8513916 32274803.13
706504 422 0 649.85 2 0 392109.1563
706504 422 0 649.85 23 649.8520426 43235615.08
706504 422 0 649.85 36 649.8518717 41466742.36
706504 422 0 649.85 57 649.8519911 40678635.07
706504 422 0 649.85 78 649.8519188 41572934.91
706504 422 0 649.85 99 649.8518002 45188889.83
706504 422 0 649.85 120 649.8519408 38351127.46

See attached for an example parameter file that you can load into the MASIC GUI to adjust. CustomSICs_WriteDetailedSICData.xml.txt

To use this file, save it to your local computer, but rename it from CustomSICs_WriteDetailedSICData.xml.txt to CustomSICs_WriteDetailedSICData.xml

For more information on searching for custom m/z values, see the MASIC Readme, visible at https://github.com/PNNL-Comp-Mass-Spec/MASIC/blob/master/Readme.md

wingkinlui commented 4 years ago

alchemistmatt:

Thank you so much for your reply.

My situation is a bit complicated. I am working on intact histone proteoforms and the same target can have mulitple charge state and thus multiple m/z. Also, I am trying to study the elution and separation of the proteoforms. Therefore, I actually need the retention times and respective intensities of the deconvoluted (de-charged and de-isotoped) precursor masses.

Up to this point, I think it is wrong for me to put' XIC' and 'precursors' in my question. It should be 'proteoform chromatogram' and 'deconvoluted precursors'. The data I actually need are ['deconvoluted precursor mass', 'retention time', 'intensity']. I was trying to generate a proteoform elution heatmap, something like the 'feature map' generated by ProMex, but with the change in intensities indicated as well.

That is why I wanted to access the deconvoluted MS1 spectra of the Informed-Proteomics workflow. I originally thought it is stored in the .pbf file. Yet, apart from lacking the knowledge and tools to parse it, I believe .pbf is just the raw, un-deconvoluted spectra saved in binary format. I have looked at all the other output files generated by the workflow, but I cannot find them.

Without the deconvoluted spectra, One way to construct the proteoform chromatograms is to use the 'most abundant isotope Mz' reported in MSPathFinderT or the 'RepMz' reported in ProMex to run MASIC (or just use in-house script to search through the raw spectra). But that would only allow me to consider only one type of precursor ion for each target.

Another way is use other deconvolution tools to obtain the deconvoluted MS1 spectra, then use the precursor mass reported in MSPathFinderT to search through the deconvoluted spectra. I have tried so by using TopFD, and here's the 'proteoform heatmap' that I am aiming to create: FA1-10ACN70_1_HeatMap

But I just think it would be better to use the data generated and reported by the same workflow (i.e. Informed-Proteomics all the way), rather then to introduce data generated from a different workflow (i.e. precursor mass generated by Informed-proteomics, intensites VS time data generated by TopFD), in my analysis.