Open Luxxii opened 2 years ago
This issue is caused by a limitation of the current implementation of MS-GF+ in Java, and fixing it is not a simple nor quick change. The issue is due to overflows on array sizes, and fixing it would involve changing arrays in many places to use an array type that supports indexing with long instead of int.
We do have other tools that we use for splitting fasta files into small enough sizes, then searching the data files with each fasta file, and then merging all results for a single data file back into one mzid file.
From: Dominik Lux @.> Sent: Friday, July 1, 2022 12:56:51 AM To: MSGFPlus/msgfplus @.> Cc: Subscribed @.***> Subject: [MSGFPlus/msgfplus] Java.lang.NegativeArraySizeException (Issue #140)
Check twice before you click! This email originated from outside PNNL.
Describe the bug Hello everyone, currently i am trying to index large peptide fasta files (~50 GB) for peptide searches. This fasta contains 85748938 entries of short peptides (all of them are unique). I am using the SABuild function and call it as follows:
Xmx256000M -d mouse_mzml_specific_peptides.fasta -tda 1 -decoy XXX
java -Xmx256000M -cp
and getting the following Error from MSGF+:
Creating peptides.revCat.fasta.
Building suffix array: /mntc/
I was wondering if this error could be fixed quickly, since i would like to use MSGF+ for identification, even for these large fastas i am using here. Maybe it is only a simple manner of using long instead of int, because of an possible overflow happening here. But i cannot judge if other places need to be adjusted.
— Reply to this email directly, view it on GitHubhttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FMSGFPlus%2Fmsgfplus%2Fissues%2F140&data=05%7C01%7Cbryson.gibbons%40pnnl.gov%7C843d071d704b4d44d46108da5b374231%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C637922590145055751%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=vlyrr7IrmLeXgVobFCK%2FKOz91e%2BzWm%2F%2BZep%2FTAKF2%2Fs%3D&reserved=0, or unsubscribehttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABPPX5JF6VMGHMHBJP2M5SDVR2QEHANCNFSM52L26GGQ&data=05%7C01%7Cbryson.gibbons%40pnnl.gov%7C843d071d704b4d44d46108da5b374231%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C637922590145055751%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=WzGwdO8RggT8%2FFdE%2Bu7r4ZuE12qu3vLlYaEoVBh1zTw%3D&reserved=0. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Thanks for the quick answer! and the clarification! Yes, splitting fasta files are always an option... However, i would look forward to execute a search via a single large fasta file.
If this is not a priority or not planned, then you can close this issue.
Describe the bug Hello everyone, currently i am trying to index large peptide fasta files (~50 GB) for peptide searches. This fasta contains 85748938 entries of short peptides (all of them are unique). I am using the SABuild function and call it as follows:
and getting the following Error from MSGF+:
This leads to the following lines here.
I was wondering if this error could be fixed quickly, since i would like to use MSGF+ for identification, even for these large fastas i am using here. Maybe it is only a simple manner of using
long
instead ofint
, because of an possible overflow happening here. But i cannot judge if other places need to be adjusted.