Update how functions use contours and hierarchies

HaleySchuhl commented 5 years ago

Update how functions use contours and hierarchies

Description

Many functions in PlantCV output and require the user to input contours and hierarchies. Often the functions that need contours and hierarchies as input also take a mask. It might be easier to hide contours and hierarchies from users all together since these can be found from the masks that get input. It is more or less repetitive info.
Functions that deal with multiple plants, i.e. pcv.roi.multi() and pcv.cluster_contours(), could instead return a list of masks and we could handle multiple objects within the pcv.analyze_* functions.
Under this restructuring we could change the way we think of entities (in regards to data stored to the Outputs class and printed out when a workflow is run in parallel over a set of images). Right now, individual images are usually an entity but in the case where images contain more than one plant it might make more sense to treat individual plants as separate entities. We can differentiate between these by naming the entities with information about it's position in addition to the image filename. pcv.roi.multi() allows for irregular plant layout so the entity can be named after the filename of the original image plus the center of the ROI used to identify the plant.

Details

If these updates seem beneficial they will impact all functions outputting contours, all functions requiring a contour input, enhancing all analyze_* functions to allow for multiple entities per image, and the Outputs data storage/json data output/json2csv function.

Completion Criteria

For a discussion:

[ ] Discuss and develop requirements docs
[ ] Create issues for next steps

dschneiderch commented 5 years ago

I've been thinking about the mask/contour/hierarchy relationship too. Something to consider is that in an image with multiple ROI and even in a series of images from the same "snapshot" (like a PSII induction curve, or co-temporal RGB and NIR image) with multiple ROIs, you only need find_objects() and pcv.roi.multi() once.
admittedly my scripts have diverged from the pipeline setup you guys designed and I want to converge pack to using the new outputs and pipeline, but in my scripts I use a for loop over a function that saves c,h,roi_c, roi_h as globals so they don't have to be recalculated within a snapshot image series. https://github.com/dschneiderch/WalzPSIIProcessing/blob/185113274fe7193ec2e339e725cd14c91551d8db/scripts/psII.py#L61

I guess my point is- is it worth recomputing the contours each time analyze_* is called?

nfahlgren commented 5 years ago

Definitely worth considering the performance implications of re-detecting contours for each function that needs them. Another issue might be memory, i.e. do we consume more memory by storing "contours" in mask format as opposed to the OpenCV contour/hierarchies list/ndarray format.

But that being said, I think we can simplify inputs and allow for passing multiple objects of interest to functions with the overall idea.

The alternative to a list of masks is a list of contour objects. The "objects" could be as simple as dictionaries with two keys (contours and hierarchies) or we could make a Contours class that is initialized each time we use pcv.find_objects, use ROI methods, etc. One or more instances of Contours could be passed to a function as a single input and the list of contours and hierarchies can be accessed internally. Here's a current typical pattern:

roi_contour, roi_hierarchy = pcv.roi.rectangle(img=img, x=0, y=0, h=100, w=100)
contours, hierarchy = pcv.find_objects(img=img, mask=mask)
filtered_contours, filtered_hierarchy, mask, area = pcv.roi_objects(img=img, 
                                                                                                               roi_contour=roi_contour,
                                                                                                               roi_hierarchy=roi_hierarchy,
                                                                                                               object_contour=contours,
                                                                                                               obj_hierarchy=hierarchy,
                                                                                                               roi_type="partial")

This would be simplified to:

roi = pcv.roi.rectangle(img=img, x=0, y=0, h=100, w=100)
obj = pcv.find_objects(img=img, mask=mask)
filtered_obj, mask, area = pcv.roi_objects(img=img, roi=roi, object=obj, roi_type="partial")

For functions like pcv.roi_objects we could support list inputs for roi, object, or both and automatically iterate over them. Then users wouldn't have to make their own for loops anymore.

When we do parallel image analysis there would still be some duplication of computed things like ROIs. I think that's unavoidable (mostly) within our parallel framework because we allow computing on each image (or image stack) to be done independently, i.e. we don't share information between computing instances.

dschneiderch commented 5 years ago

I don't really understand how a contour class vs contour list would differ so I can't comment on that, but avoiding user-triggered for loops for multiple roi would really help readability and intuitiveness.

nfahlgren commented 5 years ago

I'm picturing a very simple class like this:

class Contours:
    def __init__(self, contours, hierarchy):
        self.contours = contours
        self.hierarchy = hierarchy

Example use using pcv.find_objects:

import cv2
import numpy as np
import os
from plantcv.plantcv import print_image
from plantcv.plantcv import plot_image
from plantcv.plantcv import params
from plantcv.plantcv import Contours

def find_objects(img, mask):
    """Find all objects and color them blue.

    Inputs:
    img       = RGB or grayscale image data for plotting
    mask      = Binary mask used for contour detection

    Returns:
    contours = Instance of Contours class object

    :param img: numpy.ndarray
    :param mask: numpy.ndarray
    :return contours: class
    """

    params.device += 1
    mask1 = np.copy(mask)
    ori_img = np.copy(img)
    # If the reference image is grayscale convert it to color
    if len(np.shape(ori_img)) == 2:
        ori_img = cv2.cvtColor(ori_img, cv2.COLOR_GRAY2BGR)
    objects, hierarchy = cv2.findContours(mask1, cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE)[-2:]
    for i, cnt in enumerate(objects):
        cv2.drawContours(ori_img, objects, i, (255, 102, 255), -1, lineType=8, hierarchy=hierarchy)
    if params.debug == 'print':
        print_image(ori_img, os.path.join(params.debug_outdir, str(params.device) + '_id_objects.png'))
    elif params.debug == 'plot':
        plot_image(ori_img)
    contours = Contours(contours=objects, hierarchy=hierarchy)
    return contours

Nothing too spectacular there, we would have only one output instead of two. One can still access the standard OpenCV contours and hierarchy data structures:

contours = pcv.find_objects(img=img, mask=mask)
new_img = np.copy(img)
cv2.drawContours(new_img, contours.contours, -1, (255, 255, 255), -1, lineType=8, hierarchy=contours.hierarchy)

But now we can imagine that every function that needs contours can have one input instead of two and additionally, it's easier to provide a list of instances of Contours. Otherwise, we could still do lists as inputs to functions but we would have to have a list of contours and a list of hierarchies, both of the same length and we would have to iterate over both lists.

dschneiderch commented 5 years ago

I would suggest we use objects = pcv.find_objects(img,mask) as the de facto standard. hierarchy=objects.hierarchy is more readable than hierarchy=contours.hierarchy and we have find_objects already.

A Contours class seems cleaner than a list - but ultimately they are the same thing it appears. A benefit to the class is you can define methods right?

where is Contours class defined?

nfahlgren commented 5 years ago

It would be defined in plantcv/plantcv/__init__.py.

Sorry, it's a bit confusing because an OpenCV contours data structure is itself a list. I am picturing that the Contours class would have a method called contours that simply returns the OpenCV contours list data structure. The main nice thing about the class Contours is that you can group the OpenCV contours list and the hierarchy ndarray into a single data object.

The other lists we are talking about is when we have multiple contours data structures (lists of lists of contours). Like when we use the multi-ROI tool, we would end up having an OpenCV contours list and a hierarchy ndarray for each ROI. The advantage of the Contours class in this situation is we can make a list of instances of the class. If we imagined we had 3 ROIs we would currently have to keep track of three OpenCV contours (list data type) and three hierarchies (ndarray data type). If we built up a list using the class concept manually just for clarity it would look like this:

plants = []
# Plant 1
plant1 = Contours(contours=plant1_contours, hierarchy=plant1_hierarchy)
plants.append(plant1)

# Plant 2
plant2 = Contours(contours=plant2_contours, hierarchy=plant2_hierarchy)
plants.append(plant2)

# Plant 3
plant3 = Contours(contours=plant3_contours, hierarchy=plant3_hierarchy)
plants.append(plant3)

We wouldn't do it in this kind of manual way, but hopefully that sort of makes sense. But now we have a list where each item in the list is a single object that has the contours and hierarchy required to describe each plant. So if we wanted to analyze the shape of each plant we could conceptually do: pcv.analyze_object(img=img, obj=plants). The function would iterate over the list of plant objects and record measurements for each separately.

To your other point, classes can have methods, so there could be more that could be done. Maybe one example is Contours could have a method to output a binary mask given the contours and hierarchy information stored in the class instance (i.e. the method would use cv2.drawContours).

nfahlgren commented 5 years ago

Ah I missed your other suggestion. The class could be called Objects instead of Contours

HaleySchuhl commented 5 years ago

I agree. Something like Objects might be more descriptive to end users. OpenCV uses "contours" but it's items that contain information about the CV defined hierarchies and contours. I also think Entities might be another good option since that is what we define in the Outputs class. We had been considering images as entities but we could have multiple plants within a single image.

HaleySchuhl commented 5 years ago

This is kind of in progress now. Once plantcv-workflow.py is updated we plan to start transitioning each PlantCV function over to accept data object class instances to simplify parameters for many functions. This is in progress since the new functions getting added to the hyperspectral sub-package (since it's relatively separate from other pcv functions) have been written to utilize a class system to attach all sorts of metadata to a hyperspectral datacube or spectral index.

dschneiderch commented 5 years ago

Should this class Objects have a unique identifier along with the contour and hierarchy fields? Or would the image and roi identifiers be handled outside this class?

nfahlgren commented 5 years ago

That's a good question. One way we have thought about identifying objects/ROIs is just by their position in a list. So if you have one object/ROI they are ID = 0, and so on. That's not particularly descriptive by any means though. We could potentially ID them some other way, like their center point or something. We could possibly have some mechanism for a user to provide an ID list also?

nfahlgren commented 1 year ago

Implemented in the branch release-4.0

danforthcenter / plantcv