GrotjahnLab / surface_morphometrics

Morphometrics for Membrane Surfaces Segmented from Cryo-ET or other volumetric imaging.
GNU General Public License v3.0
18 stars 9 forks source link

TF1.mrc and TE1.mrc not valid? #9

Closed truatpasteurdotfr closed 2 years ago

truatpasteurdotfr commented 2 years ago
Singularity> md5sum TE1.mrc TF1.mrc
79ce72e3fbee1bd6fe0c155d470ea6bb  TE1.mrc
1c74274c633098b9165b9bc7363e225a  TF1.mrc
Singularity> mrcfile-validate TE1.mrc
/esmmc/morphometrics/miniconda3/envs/morphometrics/lib/python3.9/site-packages/mrcfile/mrcinterpreter.py:209: RuntimeWarning: Map ID string not found - not an MRC file, or file is corrupt
  warnings.warn(msg, RuntimeWarning)
/esmmc/morphometrics/miniconda3/envs/morphometrics/lib/python3.9/site-packages/mrcfile/mrcinterpreter.py:219: RuntimeWarning: Unrecognised machine stamp: 0x90 0x42 0x4d 0xc3
  warnings.warn(str(err), RuntimeWarning)
Map ID string is incorrect: found b'|\x94\xf2\xc4', should be b'MAP '
Invalid machine stamp: 0x90 0x42 0x4d 0xc3
File does not declare MRC format version 20140: nversion = 0
Error in data statistics: RMS deviation is 0.11326066717922652 but the value in the header is 195.9300079345703
Error in data statistics: mean is 0.005780559959153036 but the value in the header is 0.004335420206189156
Singularity> mrcfile-validate TF1.mrc 
/esmmc/morphometrics/miniconda3/envs/morphometrics/lib/python3.9/site-packages/mrcfile/mrcinterpreter.py:209: RuntimeWarning: Map ID string not found - not an MRC file, or file is corrupt
  warnings.warn(msg, RuntimeWarning)
/esmmc/morphometrics/miniconda3/envs/morphometrics/lib/python3.9/site-packages/mrcfile/mrcinterpreter.py:219: RuntimeWarning: Unrecognised machine stamp: 0x00 0x00 0x00 0x00
  warnings.warn(str(err), RuntimeWarning)
Map ID string is incorrect: found b'', should be b'MAP '
Invalid machine stamp: 0x00 0x00 0x00 0x00
File does not declare MRC format version 20140: nversion = 0
Error in data statistics: RMS deviation is 0.11780199578716964 but the value in the header is 0.0
Error in data statistics: mean is 0.007579638752052545 but the value in the header is 0.00568472919985652

it that expected?

bbarad commented 2 years ago

The header output by TFS Amira when you export a segmentation is... not great. Despite that, since its just an instance segmentation I've never bothered to write a script to fix the header. I have to use mrcfile in permissive mode for this reason. Should work fine for processing though - if you run into problems please let me know!

GenevieveBuckley commented 2 years ago

This confused me too. I ended up wasting a bit of time for the same reason, double checking the example data was actually good, before continuing.

@truatpasteurdotfr there are two options, according to this mrcfile webpage

Option 1: ignore the mrc file header problem

By adding the keyword argument permissive=True to the mrcfile open command, you can reduce the error to a warning (and ignore that if you like).

import mrcfile

with mrcfile.open(filename, permissive=True) as mrc:
    print(mrc.data)

Option 2: fix the mrc file header problem

Alternatively, you can modify the file to fix the header problem. Following the example from the mrcfile documentation here, we can do this

import mrcfile

with mrcfile.open(filename, mode='r+', permissive=True) as mrc:
    mrc.header.map = mrcfile.constants.MAP_ID

# and afterwards it says we should be able to open the file as normal, eg:
with mrcfile.open(filename) as mrc:  # no permissive=True kwarg needed now
    print(mrc.data)
GenevieveBuckley commented 2 years ago

Although I have to say, when I try fixing the mrcfile header like the docs suggest, I get an error when I tried to read the file back in a second time: ValueError: Unrecognised machine stamp: 0x90 0x42 0x4d 0xc3

I tried this on both mac and linux, so I don't think it was just a fluke.

GenevieveBuckley commented 2 years ago

I also tried the suggestion here, which also did not work:

import mrcfile
with mrcfile.open('<name>', mode='r+', permissive=True) as mrc:
    mrc.update_header_from_data()

I might make an issue over at the mrcfile repository

GenevieveBuckley commented 2 years ago

Update: the solution is to fix both header problems at once

import mrcfile

filename = "TE1.mrc"

with mrcfile.open(filename, mode='r+', permissive=True) as mrc:
    print("Original mrc file header:", mrc.header)
    mrc.header.map = mrcfile.constants.MAP_ID
    mrc.update_header_from_data()
    print("Fixed mrc file header:", mrc.header)

This works, and after that you can load the mrc files without using the permissive=True keyword argument, eg:

with mrcfile.open(filename) as mrc:
    print(mrc.data.shape)
GenevieveBuckley commented 2 years ago

Here is a copy of the example files, which I've altered to fix the header problem: examples.tar.gz

bbarad commented 2 years ago

Thank you for tracking this down! I think it is probably best for the example data to be as correct as possible, since there aren't actually that many segmentations of this kind hanging around online (The full set of segmentations are coming to EMPIAR soon as part of the final review response push).

With that said, I think these issues will crop up for almost every user who is trying to work with segmentations from Amira, so it is perhaps worth also adding this fix to the workflow. I am a little leery about editing the "base data" MRC files automatically, though. My question is - should the pipelined workflow fix headers on first load before making the mesh, or should this be left as a separate tool that people can use? Or just leave as a snippet in documentation?

@grotjahn You may have thoughts about this from a design perspective as well!