MSGFPlus / msgfplus

MS-GF+ (aka MSGF+ or MSGFPlus) performs peptide identification by scoring MS/MS spectra against peptides derived from a protein sequence database.
Other
72 stars 36 forks source link

Use already existing database help files #134

Open Stortebecker opened 2 years ago

Stortebecker commented 2 years ago

Describe the question or problem Can MSGF+ be sped up by reusing already existing database help files, when re-analyzing or analyzing several files with the same database?

Details When running MSGF+, the first step is the suffix creation and sorting of the used database file. This creates four help files (.canno, .clncp, .csarr, cseq) and a database concatenated with decoys. For larger databases, this is a time-consuming step. I often analyze a bunch of raw data sequentially, using always the same database. For every raw file, these database help files are created from scratch.

Isn't it possible to re-use the files created earlier? Maybe it is not always wanted behavior, but could you create an additional parameter for MSGF+ that allows to re-use already existing database help files? This would speed up a lot of my analyses quite a bit.

FarmGeek4Life commented 2 years ago

There is not a flag for it, but MSGF+ checks if those helper files exist (and does some integrity checks on them), and if they are good, it just uses them, rather than recreating them.


From: Florian Christoph Sigloch @.> Sent: Monday, February 28, 2022 4:14:16 AM To: MSGFPlus/msgfplus @.> Cc: Subscribed @.***> Subject: [MSGFPlus/msgfplus] Use already existing database help files (Issue #134)

Check twice before you click! This email originated from outside PNNL.

Describe the question or problem Can MSGF+ be sped up by reusing already existing database help files, when re-analyzing or analyzing several files with the same database?

Details When running MSGF+, the first step is the suffix creation and sorting of the used database file. This creates four help files (.canno, .clncp, .csarr, cseq) and a database concatenated with decoys. For larger databases, this is a time-consuming step. I often analyze a bunch of raw data sequentially, using always the same database. For every raw file, these database help files are created from scratch.

Isn't it possible to re-use the files created earlier? Maybe it is not always wanted behavior, but could you create an additional parameter for MSGF+ that allows to re-use already existing database help files? This would speed up a lot of my analyses quite a bit.

— Reply to this email directly, view it on GitHubhttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FMSGFPlus%2Fmsgfplus%2Fissues%2F134&data=04%7C01%7Cbryson.gibbons%40pnnl.gov%7C2553a018e5274bad34ec08d9fab3dc95%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C637816472673396523%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=USvZUKslMHtsUsA8z47ICKblQuORI74vLJ1%2FwFk3%2Fi0%3D&reserved=0, or unsubscribehttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABPPX5I72Y7KCG4PZE2G4ETU5NRJRANCNFSM5PQ5L5QA&data=04%7C01%7Cbryson.gibbons%40pnnl.gov%7C2553a018e5274bad34ec08d9fab3dc95%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C637816472673396523%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=MflSTmaK8jlhdPws019kd4CwgVqBdUt2s1u4jgBVB5k%3D&reserved=0. You are receiving this because you are subscribed to this thread.Message ID: @.***>