em-MRCZ / python-mrcz

Python module for compressed MRCZ-file format
Other
13 stars 4 forks source link

Pixel size not understood by other programs? #13

Open rdrighetto opened 4 years ago

rdrighetto commented 4 years ago

I've noticed that since the introduction of slices in v0.5.0, the 'pixelsize' fields in the header are written in a way that is consistent within MRCZ, but not for other programs.

For example (in a Jupyter notebook):

mrc,hed =  mrcz.ioMRC.readMRC('emd_4738.map')
print('The original pixel size is: ', hed['pixelsize'])
mrcz.writeMRC(mrc,'emd_4738_dummy.mrc',pixelsize=2.0) # write out a different pixel size just because
mrc,hed = mrcz.ioMRC.readMRC('emd_4738_dummy.mrc')
print('The new pixel size is: ', hed['pixelsize']) # what we wrote is indeed here
# but is it?
!header emd_4738_dummy.mrc

Output:

The original pixel size is:  [1.01500003 1.01500003 1.01500003]
The new pixel size is:  [2. 2. 2.]

 RO image file on unit   1 : emd_4738_dummy.mrc     Size=     221185 K

 Number of columns, rows, sections .....     384     384     384
 Map mode ..............................    2   (32-bit real)              
 Start cols, rows, sects, grid x,y,z ...    0     0     0       1      1      1
 Pixel spacing (Angstroms)..............   1.000      1.000      1.000    
 Cell angles ...........................   90.000   90.000   90.000
 Fast, medium, slow axes ...............    X    Y    Z
 Origin on x,y,z .......................    0.000       0.000       0.000    
 Minimum density ....................... -0.10898    
 Maximum density .......................  0.18136    
 Mean density ..........................  0.36188E-01
 tilt angles (original,current) ........   0.0   0.0   0.0   0.0   0.0   0.0
 Space group,# extra bytes,idtype,lens .        0        0        0        0

     1 Titles :
MRCZ0.5.2                                                                                                        

I am already looking for a solution but posting here just in case @robbmcleod already has a fix and others become aware.

robbmcleod commented 4 years ago

The program header is in what toolset? IMOD?

Probably we should compare versus mrcfile. In fact, we could add tests that deliberately do that, and add mrcfile to tests-require? https://github.com/ccpem/mrcfile

rdrighetto commented 4 years ago

Sorry, forgot to add that header is from IMOD.

Opening the file saved by mrcz with mrcfile gives the following:

with mrcfile.open('emd_4738_dummy.mrc') as mrc:
    print(mrc.voxel_size)

Output: (inf, inf, 2.)

Interestingly, I found the following in the documentation:

The sizes are not stored directly in the MRC header, but are calculated when required from the header’s cell and grid size fields. The voxel size can therefore be changed by altering the cell size:

rdrighetto commented 4 years ago

I see that changing the voxel size with mrcfile also has no effect, at least not from IMOD's header point of view. However, both mrcfile and mrcz work the same way in the sense that both define the cellsize as pixelsize * dimensions, which is correctly grasped by e2iminfo.py -H:

(in this case, I used mrcz to change the pixel size to 3.0)

emd_4738_dummy.mrc   1 images in MRC format     384 x 384 x 384
0.  
    HostEndian: little
    ImageEndian: little
    MRC.alpha: 90.0
    MRC.beta: 90.0
    MRC.gamma: 90.0
    MRC.ispg: 0
    MRC.label0: MRCZ0.5.2
    MRC.machinestamp: 16708
    MRC.mapc: 1
    MRC.mapr: 2
    MRC.maps: 3
    MRC.maximum: 0.181357830763
    MRC.mean: 0.0361878871918
    MRC.minimum: -0.108982056379
    MRC.mx: 0
    MRC.my: 0
    MRC.mz: 384
    MRC.nlabels: 1
    MRC.nsymbt: 0
    MRC.nx: 384
    MRC.nxstart: 0
    MRC.ny: 384
    MRC.nystart: 0
    MRC.nz: 384
    MRC.nzstart: 0
    MRC.rms: 0.0
    MRC.xlen: 1152.0
    MRC.ylen: 1152.0
    MRC.zlen: 1152.0
    apix_x: 1.0
    apix_y: 1.0
    apix_z: 3.0
    changecount: 0
    datatype: 7
    is_complex: 0
    is_complex_ri: 1
    is_complex_x: 0
    nx: 384
    ny: 384
    nz: 384
    origin_x: 0.0
    origin_y: 0.0
    origin_z: 0.0
    sigma: 0.0
    source_n: 0
    source_path: emd_4738_dummy.mrc
======================
representing 0 particles

As usual, the lack of consistency between MRC specifications. Not sure if this is an issue after all. But would be nice if we could have it in a way that both IMOD and EMAN2 readily understand the correct pixel size.

robbmcleod commented 4 years ago

Yeah if I look at the IMOD specification, Mastronarde claims cell size is the whole thing, and pixel size is cell size divided by the number of elements.

40 050  4   float   xlen;       Cell size; pixel spacing = xlen/mx, ylen/my, zlen/mz
44 054  4   float   ylen;
48 060  4   float   zlen;

Since mrcfile is the official implementation now, I'd prefer to match them. The MRC2014 spec appears to contradict the IMOD specification (http://www.ccpem.ac.uk/mrc_format/mrc2014.php):

Note 3
In crystallographic usage, MZ represents the number of intervals, or sampling grid, along Z in a 
crystallographic unit cell. This need not be the same as NZ, if the map doesn't cover exactly a 
single unit cell. For microscopy, where there is no unit cell, MZ represents the number of sections 
in a single volume. For a volume stack, NZ/MZ will be the number of volumes in the stack. For 
images, MZ = 1.

So I'll do a release now for your previous append fix and later if I find time, write some tests to verify that mrcfile and mrcz produce equivalent results.

robbmcleod commented 4 years ago

Do you just want to load the pixelsize from serialEM properly? And not save in a way that SerialEM would have the correct pixelsize? Because we could probably accomplish that by looking for the 'SERI' extended header and assuming that the pixelsize is wrong in that case? Another useful thing might be to load the SerialEM meta-data?

rdrighetto commented 4 years ago

Thanks for releasing 0.5.3! Loading the pixelsize from SerialEM works correctly, writing is the problem. My main concern is actually UCSF Chimera: maps written by current MRCZ are always read with a voxelsize of 1.0 no matter what was specified in the pixelsize argument of mrcz.ioMRC.writeMRC(). If the map is written by mrcfile then Chimera understands it properly.

rdrighetto commented 4 years ago

I think I have found the cause of the problem, I'm just not sure how to fix it yet: we are not writing anything to the MX and MY positions of the header (indices 28 and 32 in Python), so the programs that look in there just try to figure it out on their own.

robbmcleod commented 4 years ago

Like [1, 1, 1]? Save a volume with mrcfile and then use a hex editor to check what it has there. I use HxD on Windows, but I do recall decent hex editors being available for Ubuntu in their store.

rdrighetto commented 4 years ago

Hi Robb,

I used bless to verify that mrcfile writes MX=NX,MY=NY,MZ=NZ by default, unless a different sampling for each axis is specified (usually not the case except MZ=1 for stacks of 2D images). Figure 1 of the MRC2014 format specification paper clarifies it:

Fig. 1. An example of the volume stack specified by NX/NY/NZ and MX/MY/MZ. In this simple example, there are 3 volumes, each consisting of 2 sections.

Meanwhile, mrcz has been writing MX=MY=0 and MZ=slices since v0.5.0, which are wrong. More precisely, things got out of hand at ff34595e359436840ad72c93d89da4ff3c2a9111.

I wrote a bugfix that restores this part of the header in a manner consistent with mrcfile and allowing for indexing along Z, including a test. I'm doing a pull request now.

EDIT: there are still issues with slicing, writing is fine, the problem is reading, understanding when it should return a list and when it should return a np.array... feel free to look at the closed pull request.