danforthcenter / plantcv

Plant phenotyping with image analysis
Mozilla Public License 2.0
667 stars 265 forks source link

Query related raw data accepted in PlantCV #572

Closed parthbs closed 4 years ago

parthbs commented 4 years ago

Hi, @DannieSheng and PlantCV team

This query is related to raw data format Actually I'm developing hyperspectral reflectance calculation. Your library seems promising, but I have some doubts need to be cleared before using this amazing library, your team is developing.

Here is my application. I am using LemnaTec software which generates rawx data for each band. I've converted it to raw data bands and then stacked all the images with proper shape[In my case it's (453, 400, 940)]. For reshape, I want to use your library but I don't know the format/shape of raw data accepted by the PlantCV. I mean if you or your team can help me out with the raw data format/shape needed by the PlantCV, then it would help me to calculate reflectance. Hoping for the answer, Thank you

HaleySchuhl commented 4 years ago

Hi @parthphoenix ,

Thanks for reaching out! Our aim is to have the tools within PlantCV be flexible enough to accept many types of hyperspectral and other image data. If you've looked at our hyperspectral tutorial you'll see that we read in image data using the same pcv.readimage function but we have ENVI format images so the function takes a mode param which we set to "envi". In the background it reads, parses through, and stores information from the .hdr file (containing all the metadata about things like format/shape/interleave type/etc).

From there we create a class object that contains all the metadata we need for downstream functions, so I would recommend using our library for reading in before calibration. We expect users to have different formats but we're actively adding the support for more formats as we get samples of data to test with. If you would be willing to post an example image file then we can test with it, and update our hyperspectral sub-package accordingly!

parthbs commented 4 years ago

@HaleySchuhl

Thanks for reaching out! Our aim is to have the tools within PlantCV be flexible enough to accept many types of hyperspectral and other image data. If you've looked at our hyperspectral tutorial you'll see that we read in image data using the same pcv.readimage function but we have ENVI format images so the function takes a mode param which we set to "envi". In the background it reads, parses through, and stores information from the .hdr file (containing all the metadata about things like format/shape/interleave type/etc).

I've checked the documentation, But the thing is, LemnaTec generates rawx files for each band. otherwise, I will directly use PlantCV. But First, I need to convert it into your suitable raw format as per the PlantCV expected. Find the attached link for google drive of single plant hyperspectral images.

FYI: The link is having

In particular that 0_0_0.rawx is not our use-case. Just ignore it. 1_0_0.rawx is having a metadata file, the rest of the files are different band images. For convert them into a raw format, we have to rename 5_0_0.rawx to 5_0_0.zip and then extract it, you'll probably get image.raw for a particular band.

For anything help you need, please let me know. I would happy to help with this. https://drive.google.com/file/d/1HSQkyfyBsBM0Udr5Sa81wGkPjbqTSehf/view

dschneiderch commented 4 years ago

looks like all the information you need is in the metadata. you might get some inspiration from https://github.com/danforthcenter/data-science-tools/blob/master/LT-db-extractor.py and associated forks. since you already extracted all the data into an array, you might just be able to use GDAL with your array and data.hdr in 1_0_0 to create a proper ENVI file.

parthbs commented 4 years ago

@dschneiderch Thanks for the review. Actually, I have created a proper metadata file[data.hdr]. The thing is, I want to use the PlantCV pipeline with the created datacube[in .npy format]. Is it possible to use npy datacube with PlantCV?

As per the Line: 152 in read_data.py https://github.com/danforthcenter/plantcv/blob/master/plantcv/plantcv/hyperspectral/read_data.py#L152 PlantCV is making datacube according to the metadata file. Can I use my datacube over there?

parthbs commented 4 years ago

Hello Team, I'm closing this ticket as I've got the solution to my query. Thanks for the reply.

dschneiderch commented 4 years ago

@dschneiderch Thanks for the review. Actually, I have created a proper metadata file[data.hdr]. The thing is, I want to use the PlantCV pipeline with the created datacube[in .npy format]. Is it possible to use npy datacube with PlantCV?

As per the Line: 152 in read_data.py https://github.com/danforthcenter/plantcv/blob/master/plantcv/plantcv/hyperspectral/read_data.py#L152 PlantCV is making datacube according to the metadata file. Can I use my datacube over there?

@parthphoenix glad you figured it out. in the future, @HaleySchuhl and co. could consider making it possible to convert a python object .npy with metadata to their spectral_data class, but my point was you could just save your .npy as an envi file and then read it back in.

it'd be great if you would contribute or share somewhere your scripts for extracting the lemnatec hyperspectral to an external format. a couple of us have forked and contributed to the danforth's "data science tools" https://github.com/danforthcenter/data-science-tools where we do similar things. I don't have a hyperspectral camera myself but i've built my fork to handle PSII, SWIR, and RGB cameras from the lemnatec db. https://github.com/CougPhenomics/data-science-tools?organization=CougPhenomics&organization=CougPhenomics

nfahlgren commented 4 years ago

The Spectral_data class is available within the package, so anyone can create a new instance given a NumPy array. But without a way to parse the metadata from some source it would need to be entered into the class manually/programmatically by the user.

nfahlgren commented 4 years ago

Maybe this type of image data fits in with a few other things we have been discussing, like the PSII data where there is a stack/series of images that belong together in a set and need to be processed together. Last we talked about we discussed actually preprocessing them into an image stack, like a multi-frame TIFF so that we could package the images together and treat them like a single image from a parallel processing point of view.

What if instead we build on the "snapshot" concept. For historical reasons the "data science tools" database export tools stores images acquired at the same time on LemnaTec systems in a snapshot folder. But this grouping is all about data acquisition and not about whether they should be processed together as a unit or not. What if we grouped the data into "snapshots" based on whether they should be treated as a single dataset or not? That way the individual files are still accessible but the directory containing them could be treated as the grouping level and input to PlantCV.

dschneiderch commented 4 years ago

in general I think that is a agood idea. It might be difficult to implement "temporal" techniques though? the most obvious use case is with roots, but i've seen iterative segmentation methods where the segmentation for 3 consecutive days informs the segmentation on all the days to help clear mislabeled pieces of the image based on growth.

nfahlgren commented 4 years ago

We would still probably need other grouping mechanisms, including temporal. I imagine no matter how data are organized different users may want to group the data differently.

I was recently looking at optical flow analysis as an avenue to use temporal information: https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_video/py_lucas_kanade/py_lucas_kanade.html

dschneiderch commented 4 years ago

oh that looks interesting! especially in conjunction with skeleton package when you track root or shoot growth.