JuliaIO / MAT.jl

Julia module for reading MATLAB files
MIT License
279 stars 71 forks source link

Missing HDF5 features for MAT files #19

Closed timholy closed 10 years ago

timholy commented 10 years ago

Possibly since 27a022dd8789d9ba19a74ba4aa7a03e71be12133, some HDF5 features no longer work on MAT files:

julia> using HDF5, MAT

julia> fid = matopen("array.mat")
MatlabHDF5File(HDF5 data file: array.mat,true,false,0)

julia> exists(fid, "a2x2")
ERROR: no method exists(MatlabHDF5File,ASCIIString)

julia> close(fid)

julia> fid = h5open("array.mat")
HDF5 data file: array.mat

julia> exists(fid, "a2x2")
true

julia> exists(fid, "a2x2blah")
false

julia> close(fid)

On another machine I reverted both HDF5 and MAT to a prior commit, and such features worked again. Not sure whether this is trivial or deep, so I thought I'd bring it up for discussion.

simonster commented 10 years ago

I think this is just a matter of adding an exists function that calls exists on the underlying HDF5File, but maybe it makes more sense to refactor these changes to use invoke.

timholy commented 10 years ago

I'm not on the machine where I saw the main problems (and that's a multiuser production server so I'm a bit loathe to run tests), but I seem to remember it was more than just exists that changed. I think var1, var2 = read(fid, "var1", "var2") no longer works (don't know if that's intentional or not). I know names does work on MAT files. It seems possible that it's limited to a few items, but perhaps not. I guess we should try iteration, etc, on a MAT file?

timholy commented 10 years ago

In other words, if it's a question of just applying a few fixes and then everything works, that's fine with me (I don't want to undo your good work earlier). But if most functionality isn't available then perhaps we'd be better off reworking that change.

simonster commented 10 years ago

After my changes, MatlabHDF5File and JLDFile no longer inherit from HDF5File, so we need to explicitly define variants of any functions that we want to work that call the methods on the underlying HDF5File. There's an argument to be made that this is cleaner. HDF5File exposes a lot of functionality that Matlab doesn't actually support and probably shouldn't be easily accessible. Additionally, if MatlabHDF5File <: HDF5File then ensuring a consistent interface for MatlabHDF5File and Matlabv5File is basically impossible (indeed exists and the vararg read method aren't presently defined for Matlabv5File).

OTOH, this design decision means we have to choose what parts of the HDF5 API to expose in MatlabHDF5File and write code to expose them. It also means that functions like the vararg form of read that just delegate to other methods need to be copy/pasted between files (although we might be able to put these methods on an abstract type from which HDF5File, JLDFile, MatlabHDF5File, and maybe even Matlabv5File would inherit). If we used invoke, we wouldn't have to worry about any of these things.

I guess I lean toward staying with what we have now and figuring out which functions we want on MatlabHDF5File that aren't currently there. What do you think?

timholy commented 10 years ago

That seems eminently reasonable.