Hyperspectral submodule

nfahlgren commented 6 years ago

Description

@maliagehan has been working on some hyperspectral data-related functions for PlantCV. We discussed the possibility of making these tools optional since there may end up being dependencies that users not working with hyperspectral data may not want/need to install. Based on a discussion in the TERRA-REF project, we are proposing to develop the hyperspectral submodule (and possibly other submodules) as a separate GitHub repository (https://github.com/danforthcenter/plantcv-hyperspectral). We would design it and the main repository in such a way that you could install PlantCV by itself as is done now and optionally install the hyperspectral module into the PlantCV package namespace so that it would function like it's part of the package if it's installed. This way it's optional but we can still do continuous integration testing and everything in the linked repository. We could use this framework for other submodules that could/should be optional.

Details

We will need to do some testing to figure out the best way to structure the submodule. We may need to refactor the main PlantCV repository somewhat to achieve the right structure.

Completion Criteria

This is a discussion/proposal, specific issues should be opened for implementing the various components of this plan.

nfahlgren commented 5 years ago

I want to briefly summarize a team meeting with @maliagehan, @HaleySchuhl and @josectovar.

Based on some testing we did, it looks like the simplest approach will be to read hyperspectral images directly with NumPy. This has the advantage of keeping the images in a format that is more easily compatible with existing PlantCV methods and does not increase dependency burden.

Given that, I propose that we actually put the hyperspectral submodule directly in the PlantCV main repository instead of the planned add-on package plan. It's going to be easier to work with the two packages if they are together and we can avoid duplicating CI and documentation infrastructure. We can always split them later if we decide we need to.

Thoughts?

HaleySchuhl commented 5 years ago

Summary of today's team meeting:

We each agree that it makes sense to add a hyperspectral sub-package to PlantCV repo rather than updating separate repositories and requiring users to download two packages. We also have been able to avoid additional dependencies right now.
For now, don't worry about data types aside from ENVI. Until someone requests support for another type of image don't worry about trying to handle it.
Ideally, all functions in the hyperspectral sub-package should be able to handle different numbers of bands and different wavelength values.
Eventually we would like to use classes to simplify programming for end users. For now, dump array that is the hypercube and add classes in as we go.
Within the function that read hyperspectral data in we want to output a pseudo-RGB image. (Figure out if the data already has bands with appropriate wavelengths for red, green, and blue; otherwise we will need to randomly choose bands to create a pseudo-RGB image.
We will likely need two separate scaling functions. One that transforms data type (i.e. scale a 32 float type of data image to UINT8). Another scaling function that will stretch the range of values. Generally, we don't expect to rescale the entire datacube in this way.
Indicies function that will output bands of interest. Maybe we will give users the option to provide a list of indicies or "all" if they want every single one available. We will have to store information about the channel numbers and corresponding wavelengths used to generate each index for reproducibility since different cameras will have different data and we want to allow flexibility in case wavelengths don't match what we expect for our cameras. Maybe allow users to define a tolerance range for how much their wavelength can be off and still return an index???
Maybe we will need a step that automatically thins the data by consolidating highly correlated neighbor bands before using PLSR or other model creation steps of the workflow in order to make this step quicker. We might allow users to choose options such as minimum number of kept bands, max number of kept bands, minimum value for a goodness of fit measure? This could also be slow using pairwise comparison so this step will probably take place on a masked cube (only use plant data to decide).

HaleySchuhl commented 5 years ago

Hyperspectral function wish list:

[X] Read ENVI hyperspectral images #441
[X] Generate a set of standard spectral indices from hyperspectral/multispectral data #443
[x] Analyze_spectral data #458
[x] Add hyperspectral image calibration function #442
[x] analyze_index which would collect statistics about a specific index, such as mean, median, std? We don't want people to just find the mean of a masked index with numpy, for example, since the masked pixels skew the results if not handled. Observations stored to Outputs should include which index it was (which should get attached to the array object).
[ ] analyze_derivative to calculate the first derivative (with user defined interval?). Observations stored will likely look very similar to otheranalyze_* functions that are storing frequency data. #472
[ ] Data reduction step? Reduce highly correlated bands?
[ ] PLSR

I've been adding unit testing as I go but will add documentation while preparing the branch to actually get merged into master.

danforthcenter / plantcv