ercius / openNCEM

A collection of packages and tools for electron microscopy data analysis supported by the National Center for Electron Microscopy facility of the Molecular Foundry
GNU General Public License v3.0
58 stars 28 forks source link

need to handle bad TagOffSetArray #4

Closed ercius closed 7 years ago

ercius commented 7 years ago

In ser.writeEMD() the time tags are read and attempted to be put into the EMD file. However, the time tags are not very reliable. For example, I have a perfectly good SER file with 5 1024x1024 images in it. The DataTypeID and TagTypeID indicate that this is a 'simple series' in writeEMD() (see line 678 in io/ser.py).

The SER file size is 10,486,918 bytes (exactly). The TagOffsetArray of this file is: >>> fser.head['TagOffsetArray'] [10486918, 0, 0, 0, 0]

So in writeEMD, getTag(0) fails because there is nothing to read in at byte offset 10486918. Its the end of the file!

I suggest removing the time tags for a simple series. They are not really necessary. Or we can use a try block to skip reading/writing them in case something weird happens like in this case.

fniekiel commented 7 years ago

Right now I opt for putting it in a try block. The time tag for a simple series has been pretty much my main focus in the past: in situ heating experiments with time dependent processes.

According to the SER spec the tags seem mandatory and would need to have at least the time information. Could you provide me that test file? I would like to make sure that I did not screw up that version dependent length of things to read in. Apart from that I guess its best to put it in a try block and assign nan if the tag information is missing.

fniekiel commented 7 years ago

Inserted a try block in fileSER.getTag() to return a tag full with nans, if something goes wrong (branch iss4).

Could you test this fix or send me the file?

ercius commented 7 years ago

Sorry for the late reply on this. I answered your other questions and forgot about this one. I think the try block is a good idea. Its nice to have the time tags available if the exist.

Ill test this when I get a chance and send you the file I had trouble with by email for future testing. We probably need to add a few test files of different types (1 image, multi-image, spectra, etc.)

ercius commented 7 years ago

I uploaded a test data set which triggers the try: block you added. You will receive an email from Google Drive. Let me know if there is a better way to share this data.

I still get an error when trying to read the non-existent tags (im1.getTag(0)) or run writeEMD(). See below: TypeError: list indices must be integers or slices, not str

The problem is tag creation should be a dict tag{} not a list tag[]. I pushed a fix and now this tests properly on my series file. We can merge this once you try it @fniekiel.

ercius commented 7 years ago

I also had to add try/except blocks to all other instances of getTag in the file. Otherwise, writeEMD() did not work properly. I made the except use time[x,y] = 0 in each case. Im not sure if this is how this should be done. Maybe we should make a "fake" time tag of increasing integers.

fniekiel commented 7 years ago

Thanks for the help. I have merged the fixes from #3 and #4 to keep working on a single branch. For me it is working with the test file you provided.

I decided to remove the try-except blocks around calls to getTag, as I prefer to handle everything inside getTag. Now getTag will return a tag in either case, but its empty if the TagOffsetArray is bad.

In my initial fix (except for the tag{} problem) I did not consider that some of the entries in TagOffsetArray are zero, such that they do not raise an exception during file reading. It is now checking the TagTypeID and compares it to the one in the file header.

Could you try once more with your test? If it runs ok, I would like to merge iss3+4 to development and release them as version 1.1.2.

ercius commented 7 years ago

I just tested iss3+4. It works for me on the 01_Si110_5images_1.ser. I think we can merge this.

fniekiel commented 7 years ago

merged to development