Closed HaleySchuhl closed 1 year ago
I've been thinking about the mask/contour/hierarchy relationship too.
Something to consider is that in an image with multiple ROI and even in a series of images from the same "snapshot" (like a PSII induction curve, or co-temporal RGB and NIR image) with multiple ROIs, you only need find_objects()
and pcv.roi.multi()
once.
admittedly my scripts have diverged from the pipeline setup you guys designed and I want to converge pack to using the new outputs and pipeline, but in my scripts I use a for loop over a function that saves c,h,roi_c, roi_h as globals so they don't have to be recalculated within a snapshot image series.
https://github.com/dschneiderch/WalzPSIIProcessing/blob/185113274fe7193ec2e339e725cd14c91551d8db/scripts/psII.py#L61
I guess my point is- is it worth recomputing the contours each time analyze_*
is called?
Definitely worth considering the performance implications of re-detecting contours for each function that needs them. Another issue might be memory, i.e. do we consume more memory by storing "contours" in mask format as opposed to the OpenCV contour/hierarchies list/ndarray format.
But that being said, I think we can simplify inputs and allow for passing multiple objects of interest to functions with the overall idea.
The alternative to a list of masks is a list of contour objects. The "objects" could be as simple as dictionaries with two keys (contours
and hierarchies
) or we could make a Contours
class that is initialized each time we use pcv.find_objects
, use ROI methods, etc. One or more instances of Contours
could be passed to a function as a single input and the list of contours and hierarchies can be accessed internally. Here's a current typical pattern:
roi_contour, roi_hierarchy = pcv.roi.rectangle(img=img, x=0, y=0, h=100, w=100)
contours, hierarchy = pcv.find_objects(img=img, mask=mask)
filtered_contours, filtered_hierarchy, mask, area = pcv.roi_objects(img=img,
roi_contour=roi_contour,
roi_hierarchy=roi_hierarchy,
object_contour=contours,
obj_hierarchy=hierarchy,
roi_type="partial")
This would be simplified to:
roi = pcv.roi.rectangle(img=img, x=0, y=0, h=100, w=100)
obj = pcv.find_objects(img=img, mask=mask)
filtered_obj, mask, area = pcv.roi_objects(img=img, roi=roi, object=obj, roi_type="partial")
For functions like pcv.roi_objects
we could support list inputs for roi
, object
, or both and automatically iterate over them. Then users wouldn't have to make their own for loops anymore.
When we do parallel image analysis there would still be some duplication of computed things like ROIs. I think that's unavoidable (mostly) within our parallel framework because we allow computing on each image (or image stack) to be done independently, i.e. we don't share information between computing instances.
I don't really understand how a contour class vs contour list would differ so I can't comment on that, but avoiding user-triggered for loops for multiple roi would really help readability and intuitiveness.
I'm picturing a very simple class like this:
class Contours:
def __init__(self, contours, hierarchy):
self.contours = contours
self.hierarchy = hierarchy
Example use using pcv.find_objects
:
import cv2
import numpy as np
import os
from plantcv.plantcv import print_image
from plantcv.plantcv import plot_image
from plantcv.plantcv import params
from plantcv.plantcv import Contours
def find_objects(img, mask):
"""Find all objects and color them blue.
Inputs:
img = RGB or grayscale image data for plotting
mask = Binary mask used for contour detection
Returns:
contours = Instance of Contours class object
:param img: numpy.ndarray
:param mask: numpy.ndarray
:return contours: class
"""
params.device += 1
mask1 = np.copy(mask)
ori_img = np.copy(img)
# If the reference image is grayscale convert it to color
if len(np.shape(ori_img)) == 2:
ori_img = cv2.cvtColor(ori_img, cv2.COLOR_GRAY2BGR)
objects, hierarchy = cv2.findContours(mask1, cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE)[-2:]
for i, cnt in enumerate(objects):
cv2.drawContours(ori_img, objects, i, (255, 102, 255), -1, lineType=8, hierarchy=hierarchy)
if params.debug == 'print':
print_image(ori_img, os.path.join(params.debug_outdir, str(params.device) + '_id_objects.png'))
elif params.debug == 'plot':
plot_image(ori_img)
contours = Contours(contours=objects, hierarchy=hierarchy)
return contours
Nothing too spectacular there, we would have only one output instead of two. One can still access the standard OpenCV contours and hierarchy data structures:
contours = pcv.find_objects(img=img, mask=mask)
new_img = np.copy(img)
cv2.drawContours(new_img, contours.contours, -1, (255, 255, 255), -1, lineType=8, hierarchy=contours.hierarchy)
But now we can imagine that every function that needs contours can have one input instead of two and additionally, it's easier to provide a list of instances of Contours
. Otherwise, we could still do lists as inputs to functions but we would have to have a list of contours and a list of hierarchies, both of the same length and we would have to iterate over both lists.
I would suggest we use objects = pcv.find_objects(img,mask)
as the de facto standard. hierarchy=objects.hierarchy
is more readable than hierarchy=contours.hierarchy
and we have find_objects
already.
A Contours class seems cleaner than a list - but ultimately they are the same thing it appears. A benefit to the class is you can define methods right?
where is Contours
class defined?
It would be defined in plantcv/plantcv/__init__.py
.
Sorry, it's a bit confusing because an OpenCV contours data structure is itself a list. I am picturing that the Contours
class would have a method called contours
that simply returns the OpenCV contours list data structure. The main nice thing about the class Contours
is that you can group the OpenCV contours list and the hierarchy ndarray into a single data object.
The other lists we are talking about is when we have multiple contours data structures (lists of lists of contours). Like when we use the multi-ROI tool, we would end up having an OpenCV contours list and a hierarchy ndarray for each ROI. The advantage of the Contours
class in this situation is we can make a list of instances of the class. If we imagined we had 3 ROIs we would currently have to keep track of three OpenCV contours (list data type) and three hierarchies (ndarray data type). If we built up a list using the class concept manually just for clarity it would look like this:
plants = []
# Plant 1
plant1 = Contours(contours=plant1_contours, hierarchy=plant1_hierarchy)
plants.append(plant1)
# Plant 2
plant2 = Contours(contours=plant2_contours, hierarchy=plant2_hierarchy)
plants.append(plant2)
# Plant 3
plant3 = Contours(contours=plant3_contours, hierarchy=plant3_hierarchy)
plants.append(plant3)
We wouldn't do it in this kind of manual way, but hopefully that sort of makes sense. But now we have a list where each item in the list is a single object that has the contours and hierarchy required to describe each plant. So if we wanted to analyze the shape of each plant we could conceptually do: pcv.analyze_object(img=img, obj=plants)
. The function would iterate over the list of plant objects and record measurements for each separately.
To your other point, classes can have methods, so there could be more that could be done. Maybe one example is Contours
could have a method to output a binary mask given the contours
and hierarchy
information stored in the class instance (i.e. the method would use cv2.drawContours).
Ah I missed your other suggestion. The class could be called Objects
instead of Contours
I agree. Something like Objects
might be more descriptive to end users. OpenCV uses "contours" but it's items that contain information about the CV defined hierarchies and contours. I also think Entities
might be another good option since that is what we define in the Outputs class. We had been considering images as entities but we could have multiple plants within a single image.
This is kind of in progress now. Once plantcv-workflow.py
is updated we plan to start transitioning each PlantCV function over to accept data object class instances to simplify parameters for many functions. This is in progress since the new functions getting added to the hyperspectral sub-package (since it's relatively separate from other pcv functions) have been written to utilize a class system to attach all sorts of metadata to a hyperspectral datacube or spectral index.
Should this class Objects have a unique identifier along with the contour and hierarchy fields? Or would the image and roi identifiers be handled outside this class?
That's a good question. One way we have thought about identifying objects/ROIs is just by their position in a list. So if you have one object/ROI they are ID = 0, and so on. That's not particularly descriptive by any means though. We could potentially ID them some other way, like their center point or something. We could possibly have some mechanism for a user to provide an ID list also?
Implemented in the branch release-4.0
Update how functions use contours and hierarchies
Description
pcv.roi.multi()
andpcv.cluster_contours()
, could instead return a list of masks and we could handle multiple objects within thepcv.analyze_*
functions.pcv.roi.multi()
allows for irregular plant layout so the entity can be named after the filename of the original image plus the center of the ROI used to identify the plant.Details
Completion Criteria
For a discussion: