Quasars / orange-spectroscopy

Other
52 stars 58 forks source link

Load in HDF5 file (Pre-PR demonstration) #183

Open CCampJr opened 6 years ago

CCampJr commented 6 years ago

@markotoplak @borondics

This is a pre-PR, just to gauge the direction I'm taking.

On my fork branch h5loader (https://github.com/CCampJr/orange-spectroscopy/tree/h5loader) I have added a general HDF5 loader that will load hyperspectral images. Specifically, data.py (https://github.com/CCampJr/orange-spectroscopy/blob/h5loader/orangecontrib/spectroscopy/data.py) the class H5Reader (not to be confused with the pre-existing HDF5Reader_HERMES).

Quandries

Thanks!

markotoplak commented 6 years ago

Thank you for this prototype. I appreciate your effort to get acquainted with the current implementation and using existing functions, and even adding an example to this prototype code. Wonderful!

I do not really know anything about various HDF5 formats that are being used so I will only answer about technicalities.

While some file readers (such as this one) and writers certainly need user settings, there is, unfortunately, no good support for user settings in Orange's current file reading implementation. In the prototype, a setting window is shown within the file loader. This is, indeed, the best option for a prototype, but I see two potential future problems:

For the further implementation I would keep the graphical interface separate from reading functionality. H5Reader class will need some way of setting all the properties from outside. Perhaps it will also need a way to "sniff" the file contents and pass sensible defaults to the external interface.

In the future, each file reader/writer could expose settings in a common way (setting properties, building a setting window), but to finalize that interface could take a while. Therefore, for now, I propose a HDF5-specific widget.

To implement such widget I would try subclassing OWFile as we did for OWMultifile. Then, I would try replacing the bottom panel with the reading properties and I would try reimplementing some methods so that also the settings will be passed to a H5Reader object before reading the file.

But this is just how I would approach it. Do you perhaps see a better solution?

I do not quite understand what do you mean with axes definitions? Could you give me an example? Thank you!

borondics commented 6 years ago

Hi Guys,

Thanks for keeping the conversation going. For testing the functionality, we can ask for HDF5 files from different beamlines. They can be big, so probably not the best idea to include them as real tests.

Feri

On Thu, Jun 28, 2018 at 11:40 AM Marko Toplak notifications@github.com wrote:

Thank you for this prototype. I appreciate your effort to get acquainted with the current implementation and using existing functions, and even adding an example to this prototype code. Wonderful!

I do not really know anything about various HDF5 formats that are being used so I will only answer about technicalities.

While some file readers (such as this one) and writers certainly need user settings, there is, unfortunately, no good support for user settings in Orange's current file reading implementation. In the prototype, a setting window is shown within the file loader. This is, indeed, the best option for a prototype, but I see two potential future problems:

  • setting saving problems (a saved workflow should restore a properly opened file) and,
  • if there is a baked in interface, which does not work from scripting on a terminal without a GUI.

For the further implementation I would keep the graphical interface separate from reading functionality. H5Reader class will need some way of setting all the properties from outside. Perhaps it will also need a way to "sniff" the file contents and pass sensible defaults to the external interface.

In the future, each file reader/writer could expose settings in a common way (setting properties, building a setting window), but to finalize that interface could take a while. Therefore, for now, I propose a HDF5-specific widget.

To implement such widget I would try subclassing OWFile as we did for OWMultifile. Then, I would try replacing the bottom panel with the reading properties and I would try reimplementing some methods so that also the settings will be passed to a H5Reader object before reading the file.

But this is just how I would approach it. Do you perhaps see a better solution?

I do not quite understand what do you mean with axes definitions? Could you give me an example? Thank you!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Quasars/orange-spectroscopy/issues/183#issuecomment-400958764, or mute the thread https://github.com/notifications/unsubscribe-auth/AGA0GlwZHgfeKQx5SeX6RiifCWCO9TtGks5uBJZogaJpZM4ULQSd .

CCampJr commented 6 years ago

Hi guys,

Thanks for checking out the code. I have code for a fairly easy-to-configure metadata reader and interpreter that could easily be fit-for-this-purpose. It reads a config file of the form (significantly shortened, just for demonstrative purposes):

rosetta['XPixelSize'] = ['RasterScanParams.FastAxisStepSize', 'Raster.Fast.StepSize']
rosetta['XLabel'] = ['RasterScanParams.FastAxis','Raster.Fast.Axis','!','X']
rosetta['XUnits'] = ['RasterScanParams.FastAxisUnits','!','$\\mu$m']
rosetta['ColorCenterWL'] = ['Spectro.CenterWavelength', 'Spectro.CurrentWavelength', 'Calib.ctr_wl', '!', 729.994]

Where the entries are the order-or-precedence. And a '!' indicates a default value.

Currently, it's defined as a python dictionary, but I could use an alternate scheme, such as YML. Any thoughts?

Of course, I know that we'd like some dynamic configuration, but first step is a configuration backend that works.

Thanks!

markotoplak commented 6 years ago

@CCampJr, settings in this form seem great.

For a python program I prefer python dictionaries to YAML, as there is one less parsing step. If you'd like to formalize it (for me dictionaries are fine), you could perhaps use a Python class or classes.

borondics commented 6 years ago

Do the users have to manually edit this configuration?

On Thu, Jun 28, 2018 at 5:00 PM Marko Toplak notifications@github.com wrote:

@CCampJr https://github.com/CCampJr, settings in this form seem great.

For a python program I prefer python dictionaries to YAML, as there is one less parsing step. If you'd like to formalize it (for me dictionaries are fine), you could perhaps use a Python class or classes.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Quasars/orange-spectroscopy/issues/183#issuecomment-401045061, or mute the thread https://github.com/notifications/unsubscribe-auth/AGA0GsCuLDK78Px9np6A8atY0GuN7uB6ks5uBOGMgaJpZM4ULQSd .

CCampJr commented 6 years ago

@borondics

Currently, yes-- like a config.py type of file. I like the idea of a setup step so that the program could update it automatically.

borondics commented 6 years ago

Yeah, I think that would be important especially since Quasar/Orange is a very visual and user-friendly environment, so editing text files for non-expert users is not the thing we would want...

On Thu, Jun 28, 2018 at 5:41 PM Charles notifications@github.com wrote:

@borondics https://github.com/borondics

Currently, yes-- like a config.py type of file. I like the idea of a setup step so that the program could update it automatically.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Quasars/orange-spectroscopy/issues/183#issuecomment-401058764, or mute the thread https://github.com/notifications/unsubscribe-auth/AGA0GrONSYQ3ARClaBoIGTlTzHnnRO1Uks5uBOsegaJpZM4ULQSd .

CCampJr commented 6 years ago

@markotoplak

...what do you mean with axes definitions?

For spectroscopy and hyperspectral imagery, we will need to define what the dimensional axes are. For spectroscopy, for example, we will have a spectral axis, such as "Wavenumber ", "nanometers", or "eV". If there are multiple spectra, then the 2nd axis will be "replicate number" or possibly a time value, e.g., "seconds". In hyperspectral imagery, we will need (e.g.) "x", "y", spectral axis, possibly "z" and "time" for 5-D.

So when I'm speaking of the axes, I'm referring to vectors, 1 for each axis. Alternatively, for equi-spaced systems, one could define start, stop, and step. For "x" this could be (0,10,0.1).

With HDF5 files, we have a lot of possibilities. One can have a dataset vector that defines an axis. Similarly, one can have a "scales" property which a special property in which a dataset vector is linked to a particular axis of another dataset. Finally, we can get axis information from the attributes (metadata), which can be a vector or start-stop-step data.

My system uses metadata (dataset attributes) to calculate the vectors for x, y, and frequency, and it can handle metadata vectors alternatively. Scales, though, are also quite handy-- a feature I only learned about in the last year.

Example spatial metadata for my dataset (key : value):

Raster.Fast.Axis : X
Raster.Fast.Start : 1.0
Raster.Fast.Stop : 199.0
Raster.Fast.StepSize : 0.662207
Raster.Fast.Steps : 300

Example frequency calibration:

Calib.a_vec : np.array([-0.1677, 863.4479])  # This is polynomial fitting parameters. Can be expanded to any order
Calib.ctr_wl : 729.994  # Spectrometer center wavelength
Calib.ctr_wl0 : 729.994  # Spectrometer center wavelength when calibrated
Calib.n_pix : 1600  # Spectrometer pixels
Calib.units : nm  # Units

Clearly this can be complicated.

How does your beamline store this type of info?