Open WinkelsK opened 2 years ago
Hi, Konrad.
I saw your interesting result from MSPathFinderT on PBFs generated from RAW and from mzML routes. I was surprised by the degree of difference you saw, though, since presumably the only difference is whether the software had access to peak profiles (RAW) or peak centroids (mzML). I don't think you specified whether or not you performed peaklisting in msConvert, though.
I should think that the scan numbers will be the same whether you start from RAW or from mzML, so it should be possible to ask what each search concluded for individual scans.
I have written some tools for reading search results from multiple search engines (TopPIC, ProSight PD, pTop, and MSPT) here: https://github.com/dtabb73/ProForma-Exporters. I'm accustomed to differences between search engines, of course.
Thanks, Dave
From: WinkelsK @.> Sent: Monday, February 28, 2022 10:58 AM To: PNNL-Comp-Mass-Spec/Informed-Proteomics @.> Cc: Subscribed @.***> Subject: [PNNL-Comp-Mass-Spec/Informed-Proteomics] Analysis of profile data MSPathFinder (Issue #31)
CAUTION: This email originated from outside the Stellenbosch University network. Do not click links or open attachments unless you recognize the sender and know the content is safe.
Hi all, I have acquired profile data (MS1 and MS2) on an Thermo instrument. I have now tested the following two MSPathFinder piplines:
Both workflows run successfully and give a similar number of identifications. BUT the overlap of the identified proteoforms is only 45% (comparing sequences). I am very unsure which results I can trust. Looking forward to your feedback! Cheers, Konrad
- Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FPNNL-Comp-Mass-Spec%2FInformed-Proteomics%2Fissues%2F31&data=04%7C01%7C%7C47157397e9e54dc4055408d9faa0d531%7Ca6fa3b030a3c42588433a120dffcd348%7C0%7C0%7C637816390973202195%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=dbvpGAbw3BwQC2E77o0HmxVS%2FCl2JTfK0alyBu0NeEo%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAGF2O7T2E4AVN6EUGKPWAMTU5NBLHANCNFSM5PQTN7CA&data=04%7C01%7C%7C47157397e9e54dc4055408d9faa0d531%7Ca6fa3b030a3c42588433a120dffcd348%7C0%7C0%7C637816390973202195%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=2hdwIR6AqE25JJdFQJh5dbtbEzCTiTp5fENa%2F9YCD%2BM%3D&reserved=0. Triage notifications on the go with GitHub Mobile for iOShttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapps.apple.com%2Fapp%2Fapple-store%2Fid1477376905%3Fct%3Dnotification-email%26mt%3D8%26pt%3D524675&data=04%7C01%7C%7C47157397e9e54dc4055408d9faa0d531%7Ca6fa3b030a3c42588433a120dffcd348%7C0%7C0%7C637816390973202195%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=r6KnGCY0ZAsP1Xdu%2Fguy%2F%2BGpQVD7kDPY7NFZimrBE8I%3D&reserved=0 or Androidhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplay.google.com%2Fstore%2Fapps%2Fdetails%3Fid%3Dcom.github.android%26referrer%3Dutm_campaign%253Dnotification-email%2526utm_medium%253Demail%2526utm_source%253Dgithub&data=04%7C01%7C%7C47157397e9e54dc4055408d9faa0d531%7Ca6fa3b030a3c42588433a120dffcd348%7C0%7C0%7C637816390973202195%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=iizKIiDwqhJKbUBwvHEYBjGy5oQVHMOoc33CfSvCGmo%3D&reserved=0. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Short feedback: What I have found when working with profile thermo raw files: Results are the same, when
Results are different (only 50% overlap on Sequences identified) when I use an mzML file in profile mode (generated by MSConvert without any filter) as input for PbfGen and Promex.
Differences are expected - ProMex and MSPathfinder need centroided data, and the raw file reader built in uses the centroiding provided by the Thermo library/raw file, same as the MSConvert vendor centroiding. When reading an mzML created with profile data, it uses either the CWT centroiding (if it can find ProteoWizard DLLs) or a very simplistic local maxima algorithm, both of which will not match the vendor centroiding results.
As for the only 50% overlap, there will be differences in every single peak mass with those different centroiding algorithms; you could look at the differences by opening the files with LCMSSpectator.
From: WinkelsK @.> Sent: Wednesday, March 2, 2022 6:21:27 AM To: PNNL-Comp-Mass-Spec/Informed-Proteomics @.> Cc: Subscribed @.***> Subject: Re: [PNNL-Comp-Mass-Spec/Informed-Proteomics] Analysis of profile data MSPathFinder (Issue #31)
Check twice before you click! This email originated from outside PNNL.
Short feedback: What I have found when working with profile thermo raw files: Results are the same, when
Results are different (only 50% overlap on Sequences identified) when I use an mzML file in profile mode (generated by MSConvert without any filter) as input for PbfGen and Promex. [Capture]https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fuser-images.githubusercontent.com%2F92794723%2F156379507-fc08403d-1369-4eb7-81d1-f1525212cbf1.PNG&data=04%7C01%7Cbryson.gibbons%40pnnl.gov%7C6c5b0c7f2e90466cae6f08d9fc57f4f8%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C637818276980714889%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=F4UbMQyDDK4rpTDtAzcsA6Ot%2B6eVEk5GJaI5hxxB8qI%3D&reserved=0
— Reply to this email directly, view it on GitHubhttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FPNNL-Comp-Mass-Spec%2FInformed-Proteomics%2Fissues%2F31%23issuecomment-1056982781&data=04%7C01%7Cbryson.gibbons%40pnnl.gov%7C6c5b0c7f2e90466cae6f08d9fc57f4f8%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C637818276980714889%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=oqxJFzjhe0RmGKWky4E56hbH3demmhF3aUPxokeitk8%3D&reserved=0, or unsubscribehttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABPPX5L4XKHXS5R64SLSYCTU552OPANCNFSM5PQTN7CA&data=04%7C01%7Cbryson.gibbons%40pnnl.gov%7C6c5b0c7f2e90466cae6f08d9fc57f4f8%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C637818276980714889%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=pvMqqjKKK%2Fsa3%2FdTW6KWc1xpxzy%2BbJbuovM7FsEKvvY%3D&reserved=0. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Thanks Bryson! I didn't put that all together initially, but am now happily using MSPathFinder! Thanks :) Konrad
Hi all, I have acquired profile data (MS1 and MS2) on an Thermo instrument. I have now tested the following two MSPathFinder piplines:
Both workflows run successfully and give a similar number of identifications. BUT the overlap of the identified proteoforms is only 45% (comparing sequences). I am very unsure which results I can trust. Looking forward to your feedback! Cheers, Konrad