Additional .h5 extension in output filenames

DANAJK commented 3 years ago

Image writing adds a '.h5' extension. However the python fully_sampled.py example uses output.h5 as the default name, resulting in a file called output.h5.h5. [This is a corrected issue, a previous version referred incorrectly to grappa_basic.py.]

evgueni-ovtchinnikov commented 3 years ago

This was apparently corrected some time ago.

DANAJK commented 3 years ago

I was using the exercises that were installed when I set up SIRF using Docker sometime in the past ~2 weeks. My command prompt implies I am using SIRF v2.2.0. Locally, my installed fully_sampled.py uses a default output file of output.h5, but I see that on the Git Hub website, on the master branch, also labelled v2.2.0, the default filename is output. Not sure how it is there seem to be different versions.

KrisThielemans commented 3 years ago

This took me a while as https://github.com/SyneRBI/SIRF/blob/master/examples/Python/MR/interactive/fully_sampled.py doesn't contain this code.. I finally found it in https://github.com/SyneRBI/SIRF/blob/2e1d74aa4d40f5edf6a441d54e60a26a55f79b71/examples/Python/MR/fully_sampled_recon.py#L14

This was corrected in https://github.com/SyneRBI/SIRF/commit/c4e543b6fbb661fa30d4cbaaae9e21ad1651bc82 (always best to provide links in Q&A 😉 )

on the master branch, also labelled v2.2.0

where do you see it's labelled that way? master is well ahead of v2.2.0.

KrisThielemans commented 3 years ago

However, should our writing routine check if this is going to happen? There's a few options:

don't provide a default extension (easy/safe/most flexible)
if there is a .h5 extension already, don't add another one (easy enough and safe)
if there is any extension already, don't add another one (this strategy followed in STIR Interfile but it causes trouble as it means you cannot use filenames with dots, e.g. mytest1.0.h5)
just keep the status-quo

DANAJK commented 3 years ago

When I first encountered this issue, I spent some time trying to figure out where the .h5 extension was added - but without success.

My preference would be to not change the filename 'behind the scenes'. In other words, the filename is whatever the user provides, or the stated default, and these would include the .h5 extension.

DANAJK commented 3 years ago

In my Ubuntu VM, the command prompt is: (base) sirf:/opt/SIRF-SuperBuild/sources/SIRF/examples/Python/MR ((v2.2.0))$

(There is a separate issue that this command prompt doesn't work in a Jupyter terminal resulting in a command prompt that is an error message ....)

If you go to the page below, on the right it says Releases SIRF v2.2.0 (latest): https://github.com/SyneRBI/SIRF

KrisThielemans commented 3 years ago

ok. yes, v2.2.0 is the latest release, and I guess you're using the release docker image. (somewhat confusingly, the latest docker image corresponds to SIRF master, at the time of the last SIRF-SuperBuild update).

If you don't mind creating a SIRF-SuperBuild issue on the problem with the prompt. I have no idea where that comes from.

KrisThielemans commented 3 years ago

My preference would be to not change the filename 'behind the scenes'. In other words, the filename is whatever the user provides, or the stated default, and these would include the .h5 extension.

I agree with that. @evgueni-ovtchinnikov @ckolbPTB what do you think?

ckolbPTB commented 3 years ago

yes, I agree. I would not change the filename either.

KrisThielemans commented 3 years ago

reopening therefore

KrisThielemans commented 3 years ago

@evgueni-ovtchinnikov is there a reason for appending .h5? If not, please assign this to v3.0 milestone, create a PR for it removing that feature. It will need to have a link in CHANGES.md warning about backwards incompatibility (potential changes in the UserGuide as well? maybe not)

DANAJK commented 3 years ago

Also, the different examples do not all have the same default behaviour. I think grappa_basic.py produces no output file if the user does not explicitly provide an output filename.

evgueni-ovtchinnikov commented 3 years ago

how about PET side?

KrisThielemans commented 3 years ago

how about PET side?

good point. Unfortunately, this is currently handled by STIR, causing the above problem. It's behaviour is to append extensions of none given ('.hv for header, '.v' for binary data, 'ahv` for an (mostly obsolete) older format of the header)

KrisThielemans commented 3 years ago

The complication for the default output format (Interfile) of STIR is that it has separate headers and binary data. I checked the STIR logic a bit more. here's the code. It results in

write("noextension") -> noextension.hv, noextension.ahv, noextension.v
- write("pre.hv") -> pre.hv, pre.ahv, pre.v
- write("pre.ext") -> ext.hv, pre.ahv, pre.ext

Especially the last case is surprising to many. I can change STIR behaviour to something else for next major release, but that will take a while to get out.

I guess we could make the SIRF PET writer (when using the default format) check if there's no .hv extension, and if not, throw an error.

evgueni-ovtchinnikov commented 3 years ago

@KrisThielemans: I would rather go for your third option - safe and backward compatible (and frankly I did not get what was wrong with mytest1.0.h5).

KrisThielemans commented 3 years ago

The 3rd option "if there is any extension already, don't add another one2 sounds great but is impossible to implement. How do you know that the user wants mytest1.0.h5 or if she meant to use .0 as an extension. (we know, but a computer? whatever logic we put it will always trip up someone). So, a trivial implementation of it would just write mytest1.0, not mytest1.0.h5 as the user probably expects)

evgueni-ovtchinnikov commented 3 years ago

SIRF would have to throw "unknown extension 0" then.

And for mytest1.hv as the destination for MR image there would be error message from STIR, I presume.

KrisThielemans commented 3 years ago

SIRF would have to throw "unknown extension 0" then.

ah, you suggest to do the same for PET and MR, but with different extensions (.hv for PET Interfile, .h5 for MR HDF5), i.e. add the extension if there's no dot in the filename (after the directory). That would be fine by me. (not so trivial to implement sadly, unless usingboost::filesystem` which in all honesty I'd rather avoid).

And for mytest1.hv as the destination for MR image there would be error message from STIR, I presume.

that confuses me. If we do MRImageData.write(bla), STIR will never be called.

evgueni-ovtchinnikov commented 3 years ago

i.e. add the extension if there's no dot in the filename (after the directory)

Or maybe better just to add .h5 if the text after the last dot is not a known extension (currently .h5, .dcm and I presume .nii).

(not so trivial to implement sadly, unless using boost::filesystem` which in all honesty I'd rather avoid).

Sorry, I do not get this - we just need to analyse the text string argument of .write, do we not?

that confuses me. If we do MRImageData.write(bla), STIR will never be called.

True, I mislooked the fact that .hv would be just another unknown extension for MRImageData, sorry.

KrisThielemans commented 3 years ago

i.e. add the extension if there's no dot in the filename (after the directory)

Or maybe better just to add .h5 if the text after the last dot is not a known extension (currently .h5, .dcm and I presume .nii).

as in the 3rd bullet, the MRImageData writer only knows about writing ISMRMRD images. But I think this just gets too confusion for people. (There is no reason to prefer .h5` really). Best to keep it simple: either never add anything (preferred by @DANAJK and @ckolbPTB), or ONLY add it if no extension is given. The latter has the advantage that it's backwards compatible. However, it has the (strong) disadvantage that the following (symbolic code) can fail

mr_image.write(filename)
mr_image.read(filename)

(not so trivial to implement sadly, unless using boost::filesystem` which in all honesty I'd rather avoid).

Sorry, I do not get this - we just need to analyse the text string argument of .write, do we not?

yes, but you need to know about OS conventions (directory separators etc). soon gets messy.

All in all, I'm leaning again towards "never add anything" being the best (certainly for a single-file format as ISMRMRD).

evgueni-ovtchinnikov commented 3 years ago

So, I understand you are eventually for the 3rd option (for PET too)?

Minor remark:

MRImageData writer only knows about writing ISMRMRD images.

No, it will write dicom if the extension is .dcm:

    def write(self, file, ext='h5'):
        """For extension, use 'dcm' to write as DICOM. Else, image will be written as h5."""
        try_calling(pygadgetron.cGT_writeImages(self.handle, file, ext))

(the third argument will have to go if we require the extension to be given in file).

evgueni-ovtchinnikov commented 3 years ago

How about this simple algorithm to be used by all read and write methods: if the filename does not end with an extension that SIRF can handle (MR: .h5 or .dcm, PET: .hv / .hs), then add the default one (MR: .h5, PET: .hv / .hs)? Seems to solve all issues raised so far.

ckolbPTB commented 3 years ago

would this mean that MR write(data.dcm) would lead to data.dcm.h5 because of ext='h5' or would the 'dcm' extension overwrite ext='h5'?

I am still very much in favour of not doing anything to the filename. File endings indicate what type of file it is but they are not essential. In addition, they are not unique. We might use .h5 but others might use .hdf5.

KrisThielemans commented 3 years ago

File endings indicate what type of file it is but they are not essential. In addition, they are not unique. We might use .h5 but others might use .hdf5.

This is the reason that in STIR we use a complicated way of specifying file-formats via keywords etc. (but as it's complicated, few people like it!)

If we have one writer that can write different file formats (here ISMRMRD and DICOM), then we either have to rely on the filename/extension (this is what ITK does for instance), or pass another parameter to say which one we want. We could say

def write(self, filename, format='h5')

(although format=ISMRMRD` would be more accurate).

For sirf::STIRImageData, we have https://github.com/SyneRBI/SIRF/blob/b789a80994e75c5011b5e76cab800be0ad7d68ba/src/xSTIR/cSTIR/include/sirf/STIR/stir_data_containers.h#L817-L838 but we could extend that to allow a few known strings as second argument (e.g. format=nii), and then handle the complication in the SIRF C++ code.

I prefer this over filename/extension assumptions. It allows people to use their own conventions and doesn't seem to be too onerous (although of course more work for the user than simply saying myfile.dcm, and probably confusing when they do write('myfile.dcm') where it would write an ISMRMRD file.

Let's decide on this tomorrow and just go ahead.

(Side note: one of the STIR writers defers to ITK, and we don't really know which extensions/fileformats that a particular ITK version supports).

DANAJK commented 3 years ago

How about having a separate function to handle the changing extensions etc. The user could choose to use it or not and it would be clear there was something happening and where to look. The write method would also have the output type passed in so that it was explicit.

fileBase = 'new_results'
outType = 'ISMRMRD'

outFilename = adjustName(fileBase, outType) 

write(outFilename, outType)

SyneRBI / SIRF

Additional .h5 extension in output filenames #845