CCMS-UCSD / GNPS_Workflows

Public Workflows at GNPS
https://gnps.ucsd.edu/
Other
51 stars 43 forks source link

Trouble processing Thermo ion trap GC-MS data with MSHub #874

Open ISU-rfitch opened 3 months ago

ISU-rfitch commented 3 months ago

Describe the bug MSHub processing job fails 166 files, Thermo iTQ ion trap EI data converted from raw to mzML with ProteoWizard MSconvert with peak picking, vendor algorithm All samples run under same conditions, column program etc.

Persistent error is "all input arrays must have the same shape" I wonder if it has something to do with an inconsistent number of scans due to AGC?

Excerpt: raise ValueError('all input arrays must have the same shape') ValueError: all input arrays must have the same shape Traceback (most recent call last): File "/data/ccms-gnps/tools/mshub-gc/release_30/proc/io/importmsdata.py", line 446, in mzml_reader X = np.array(sp.centroidedPeaks).astype(float) File "/data/ccms-gnps/tools/miniconda3_gamma/envs/mshub-gc/lib/python3.7/site-packages/pymzml/spec.py", line 1636, in centroidedPeaks return self.peaks("centroided") File "/data/ccms-gnps/tools/miniconda3_gamma/envs/mshub-gc/lib/python3.7/site-packages/pymzml/spec.py", line 1031, in peaks arr = np.stack((mz, i), axis=-1) File "<__array_function__ internals>", line 6, in stack File "/data/ccms-gnps/tools/miniconda3_gamma/envs/mshub-gc/lib/python3.7/site-packages/numpy/core/shape_base.py", line 425, in stack

Error message repeats for each subsequent file until system gives up at 87 files, then does not read further.

all input arrays must have the same shape

  1. spec-00039.mzML: Failed to read in data all input arrays must have the same shape
  2. spec-00022.mzML: Failed to read in data all input arrays must have the same shape
  3. spec-00015.mzML: Failed to read in data all input arrays must have the same shape
  4. spec-00091.mzML: Failed to read in data all input arrays must have the same shape
  5. spec-00006.mzML: Failed to read in data all input arrays must have the same shape

and so on...

May be a newbie issue, first time using GNPS. All help appreciated. Many thanks, Rick Fitch

ISU-rfitch commented 3 months ago

I think I found the problem. The blank and the samples were run under different programs. Same time parameters, but the blank uses 3 microscans and the samples had 1 microscan, which triples the number of total scans in the chromatogram. Rerunning without the blank. Will advise on success/failure.

ISU-rfitch commented 3 months ago

Unfortunately, this did not fix the problem completely. This time it ran for much less time but still gave a similar error.

Traceback (most recent call last): File "/data/ccms-gnps/tools/mshub-gc/release_30/proc/io/importmsdata.py", line 446, in mzml_reader X = np.array(sp.centroidedPeaks).astype(float) File "/data/ccms-gnps/tools/miniconda3_gamma/envs/mshub-gc/lib/python3.7/site-packages/pymzml/spec.py", line 1636, in centroidedPeaks return self.peaks("centroided") File "/data/ccms-gnps/tools/miniconda3_gamma/envs/mshub-gc/lib/python3.7/site-packages/pymzml/spec.py", line 1031, in peaks arr = np.stack((mz, i), axis=-1) File "<__array_function__ internals>", line 6, in stack File "/data/ccms-gnps/tools/miniconda3_gamma/envs/mshub-gc/lib/python3.7/site-packages/numpy/core/shape_base.py", line 425, in stack raise ValueError('all input arrays must have the same shape') ValueError: all input arrays must have the same shape

Not sure which may be the offending file. I will run through the batch to see if I can spot another file with troubles.

ISU-rfitch commented 3 months ago

On rechecking the blank file, the one I included with the set was under the same parameters, so removing it should have had no effect, so I'm not sure why the job reran so short and did not have the long list of array issues. Because of AGC, all of the files have a slightly different number of total scans but all are around 3800. However, other Thermo MS instruments such as orbitraps use AGC, so this should not be the problem. Again, any suggestions would be welcome.