czbiohub-sf / Rapid-QC-MS

Realtime quality control for mass spectrometry data acquisition
https://czbiohub-sf.github.io/Rapid-QC-MS
Other
15 stars 2 forks source link

MSConvert throws "corrupt file" error even after double checksum match #71

Closed wasimsandhu closed 1 year ago

wasimsandhu commented 1 year ago
Data acquisition completed for MIFI004_BK_3_Pos_QE2_C18_044
format: mzML
    m/z: Compression-None, 64-bit
    intensity: Compression-None, 32-bit
    rt: Compression-None, 64-bit
ByteOrder_LittleEndian
 indexed="true"
outputPath: C:\MS-AutoQC\data\Thermo_QE_2_MIFI004\data/
extension: .mzML
contactFilename:
runIndexSet:

spectrum list filters:

chromatogram list filters:

filenames:
  C:\MS-AutoQC\data\Thermo_QE_2_MIFI004\data\MIFI004_BK_3_Pos_QE2_C18_044.raw

processing file: C:\MS-AutoQC\data\Thermo_QE_2_MIFI004\data\MIFI004_BK_3_Pos_QE2_C18_044.raw
[RawFileImpl::ctor()] Corrupt RAW file C:\MS-AutoQC\data\Thermo_QE_2_MIFI004\data\MIFI004_BK_3_Pos_QE2_C18_044.raw
Error processing file C:\MS-AutoQC\data\Thermo_QE_2_MIFI004\data\MIFI004_BK_3_Pos_QE2_C18_044.raw
wasimsandhu commented 1 year ago

For our LC-MS instrument runs, MS data is being collected and written to the file for a certain portion of the time, and then after that, no MS data is being collected but the chromatography is still running. The file is not complete because the instrument still needs to append that chromatography data to the end of it.

With most HILIC and C18 short runs, data is being collected for 12 minutes and then there’s 3 minutes left for that sample where no data is being collected. So the checksum passes and the file is ready to be processed

With this run, data is being collected for 20 minutes and then there’s 10 minutes of the gradient left. So the checksum will pass during the 10 minutes, but the data file is not complete because the chromatography is still going.

The first solution that comes to mind is to require the user to provide these pieces of information when setting up the chromatography method. The solution for now is to attempt to run msconvert 5 times over the course of 15 minutes.

# Run MSConvert (and give 5 more attempts if it fails)
try:
    mzml_file = run_msconvert(path, filename, extension, mzml_file_directory)

    for attempt in range(5):
        if not os.path.exists(mzml_file):
            time.sleep(180)
            mzml_file = run_msconvert(path, filename, extension, mzml_file_directory)
        else:
            break
except:
    mzml_file = None
    print("Failed to run MSConvert.")
    traceback.print_exc()