Analysis of profile data MSPathFinder

WinkelsK commented 2 years ago

Hi all, I have acquired profile data (MS1 and MS2) on an Thermo instrument. I have now tested the following two MSPathFinder piplines:

Use the raw file as input for pbf generation via PbfGen and Promex deconvolution.
Convert raw file with msconvert peak picking to mzML and subsequently use this mzml file as input for PbfGen and Promex.

Both workflows run successfully and give a similar number of identifications. BUT the overlap of the identified proteoforms is only 45% (comparing sequences). I am very unsure which results I can trust. Looking forward to your feedback! Cheers, Konrad

dtabb73 commented 2 years ago

Hi, Konrad.

I saw your interesting result from MSPathFinderT on PBFs generated from RAW and from mzML routes. I was surprised by the degree of difference you saw, though, since presumably the only difference is whether the software had access to peak profiles (RAW) or peak centroids (mzML). I don't think you specified whether or not you performed peaklisting in msConvert, though.

I should think that the scan numbers will be the same whether you start from RAW or from mzML, so it should be possible to ask what each search concluded for individual scans.

I have written some tools for reading search results from multiple search engines (TopPIC, ProSight PD, pTop, and MSPT) here: https://github.com/dtabb73/ProForma-Exporters. I'm accustomed to differences between search engines, of course.

Thanks, Dave

From: WinkelsK @.> Sent: Monday, February 28, 2022 10:58 AM To: PNNL-Comp-Mass-Spec/Informed-Proteomics @.> Cc: Subscribed @.***> Subject: [PNNL-Comp-Mass-Spec/Informed-Proteomics] Analysis of profile data MSPathFinder (Issue #31)

CAUTION: This email originated from outside the Stellenbosch University network. Do not click links or open attachments unless you recognize the sender and know the content is safe.

Hi all, I have acquired profile data (MS1 and MS2) on an Thermo instrument. I have now tested the following two MSPathFinder piplines:

Use the raw file as input for pbf generation via PbfGen and Promex deconvolution.
Convert raw file with msconvert peak picking to mzML and subsequently use this mzml file as input for PbfGen and Promex.

Both workflows run successfully and give a similar number of identifications. BUT the overlap of the identified proteoforms is only 45% (comparing sequences). I am very unsure which results I can trust. Looking forward to your feedback! Cheers, Konrad

- Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FPNNL-Comp-Mass-Spec%2FInformed-Proteomics%2Fissues%2F31&data=04%7C01%7C%7C47157397e9e54dc4055408d9faa0d531%7Ca6fa3b030a3c42588433a120dffcd348%7C0%7C0%7C637816390973202195%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=dbvpGAbw3BwQC2E77o0HmxVS%2FCl2JTfK0alyBu0NeEo%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAGF2O7T2E4AVN6EUGKPWAMTU5NBLHANCNFSM5PQTN7CA&data=04%7C01%7C%7C47157397e9e54dc4055408d9faa0d531%7Ca6fa3b030a3c42588433a120dffcd348%7C0%7C0%7C637816390973202195%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=2hdwIR6AqE25JJdFQJh5dbtbEzCTiTp5fENa%2F9YCD%2BM%3D&reserved=0. Triage notifications on the go with GitHub Mobile for iOShttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapps.apple.com%2Fapp%2Fapple-store%2Fid1477376905%3Fct%3Dnotification-email%26mt%3D8%26pt%3D524675&data=04%7C01%7C%7C47157397e9e54dc4055408d9faa0d531%7Ca6fa3b030a3c42588433a120dffcd348%7C0%7C0%7C637816390973202195%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=r6KnGCY0ZAsP1Xdu%2Fguy%2F%2BGpQVD7kDPY7NFZimrBE8I%3D&reserved=0 or Androidhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplay.google.com%2Fstore%2Fapps%2Fdetails%3Fid%3Dcom.github.android%26referrer%3Dutm_campaign%253Dnotification-email%2526utm_medium%253Demail%2526utm_source%253Dgithub&data=04%7C01%7C%7C47157397e9e54dc4055408d9faa0d531%7Ca6fa3b030a3c42588433a120dffcd348%7C0%7C0%7C637816390973202195%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=iizKIiDwqhJKbUBwvHEYBjGy5oQVHMOoc33CfSvCGmo%3D&reserved=0. You are receiving this because you are subscribed to this thread.Message ID: @.***>

WinkelsK commented 2 years ago

Short feedback: What I have found when working with profile thermo raw files: Results are the same, when

Using raw files as input for PbfGen and Promex
Using mzML files (generated via MSConvert, peak picking, centroid, Vendor, see picture)

Results are different (only 50% overlap on Sequences identified) when I use an mzML file in profile mode (generated by MSConvert without any filter) as input for PbfGen and Promex. Capture

FarmGeek4Life commented 2 years ago

Differences are expected - ProMex and MSPathfinder need centroided data, and the raw file reader built in uses the centroiding provided by the Thermo library/raw file, same as the MSConvert vendor centroiding. When reading an mzML created with profile data, it uses either the CWT centroiding (if it can find ProteoWizard DLLs) or a very simplistic local maxima algorithm, both of which will not match the vendor centroiding results.

As for the only 50% overlap, there will be differences in every single peak mass with those different centroiding algorithms; you could look at the differences by opening the files with LCMSSpectator.

From: WinkelsK @.> Sent: Wednesday, March 2, 2022 6:21:27 AM To: PNNL-Comp-Mass-Spec/Informed-Proteomics @.> Cc: Subscribed @.***> Subject: Re: [PNNL-Comp-Mass-Spec/Informed-Proteomics] Analysis of profile data MSPathFinder (Issue #31)

Check twice before you click! This email originated from outside PNNL.

Short feedback: What I have found when working with profile thermo raw files: Results are the same, when

Using raw files as input for PbfGen and Promex
Using mzML files (generated via MSConvert, peak picking, centroid, Vendor, see picture)

Results are different (only 50% overlap on Sequences identified) when I use an mzML file in profile mode (generated by MSConvert without any filter) as input for PbfGen and Promex. [Capture]https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fuser-images.githubusercontent.com%2F92794723%2F156379507-fc08403d-1369-4eb7-81d1-f1525212cbf1.PNG&data=04%7C01%7Cbryson.gibbons%40pnnl.gov%7C6c5b0c7f2e90466cae6f08d9fc57f4f8%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C637818276980714889%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=F4UbMQyDDK4rpTDtAzcsA6Ot%2B6eVEk5GJaI5hxxB8qI%3D&reserved=0

— Reply to this email directly, view it on GitHubhttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FPNNL-Comp-Mass-Spec%2FInformed-Proteomics%2Fissues%2F31%23issuecomment-1056982781&data=04%7C01%7Cbryson.gibbons%40pnnl.gov%7C6c5b0c7f2e90466cae6f08d9fc57f4f8%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C637818276980714889%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=oqxJFzjhe0RmGKWky4E56hbH3demmhF3aUPxokeitk8%3D&reserved=0, or unsubscribehttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABPPX5L4XKHXS5R64SLSYCTU552OPANCNFSM5PQTN7CA&data=04%7C01%7Cbryson.gibbons%40pnnl.gov%7C6c5b0c7f2e90466cae6f08d9fc57f4f8%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C637818276980714889%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=pvMqqjKKK%2Fsa3%2FdTW6KWc1xpxzy%2BbJbuovM7FsEKvvY%3D&reserved=0. You are receiving this because you are subscribed to this thread.Message ID: @.***>

WinkelsK commented 2 years ago

Thanks Bryson! I didn't put that all together initially, but am now happily using MSPathFinder! Thanks :) Konrad

PNNL-Comp-Mass-Spec / Informed-Proteomics

Analysis of profile data MSPathFinder #31