Closed ypriverol closed 1 week ago
@ypriverol These are the files that produce a OutOfMemory error. Did I understand this right?
These are the two files in PRIDE:
https://ftp.pride.ebi.ac.uk/pride/data/archive/2015/12/PXD001819/UPS1_125amol_R1.raw https://ftp.pride.ebi.ac.uk/pride/data/archive/2015/12/PXD001819/UPS1_125amol_R2.raw
Yes @caetera . Thanks a lot for looking into this.
Thanks! Just to clarify: It affects only the second file. And the problem is that if you look at the output that yasset posted, the OpenMS mzml parser parses at least one spectrum that has an incredibly high m/z value. So high in fact that it is probably the result of an uninitialized random memory location. It could be that the annotated length of the mzarray does not match the actual length of the spectrum to be decoded anymore. Or something even more strange.
Note also: the first output in the first post of this issue is from trfp 1.3.6, while the second (faulty) output is from the latest version.
The out of memory error is a downstream result of this parsing for algorithms that are allocating memory proportional to the m/z range of an mzml file.
Maybe also important: Note the different number of peaks that are parsed (while the number of spectra stayed the same)
I had a chance to look into it.
The mzML file provided here - https://github.com/bigbio/quantms/issues/432 indeed have a scan 34628 that has very large m/z values - [1.968968e-19 -- 7.555080e+31]
, this is a likely cause for error
However, when I downloaded the RAW file from PRIDE and tried to reproduce the error in TRFP, I could not. The corresponding scan is fine - [352.092743 -- 1799.879639]
(as well as all other) .
I can see in the mzML file that SHA-1 sum for your RAW file was
<cvParam cvRef="MS" accession="MS:1000569" value="7653a7116752cc168f9b7890c80fa4ab3edfea31" name="SHA-1" />
,
however, the file I got from PRIDE returns
<cvParam cvRef="MS" accession="MS:1000569" value="8e16697ebbc3962b09a90385579fe79552a7d98c" name="SHA-1" />
Could it be that the RAW file used for the conversion in TRFP got corrupted? Could you, please, try to download the file again and reprocess it?
Interesting, that happens in two different machines, in two different places.
@ypriverol did we really try two different downloads of the file though? I only know of the one file that Dai shared.
I tested 2 different Windows (10 + 11) systems, and two Linux (Ubuntu 20.04 LTS and 24.04 LTS, both running latest version of Mono), i.e. 4 downloads in total - SHA-1 checksums (calculated by TRFP and by sha1sum
) are consistent between the downloads, and differ to the one in the shared mzML. Conversion is successful in all cases.
Honestly, I tend to believe that is an issue outside TRFP, first, since the checksum is different, second, since the code used for spectral data array creation was not changed since version 1.2.x. If it was working before it should not be broken now.
Could you provide more information on the platform you are running on? Is it containerized?
I have been trying to reproduce the error without success. We can leave it for now. Im adding checksum to SDRF, and also to our pipeline for the future, to trace this better.
While testing quantms, we found https://github.com/bigbio/quantms/issues/432 an error in some of the raw files: