Doodleverse / dash_doodler

Doodler. A web application built with plotly/dash for image segmentation with minimal supervision. Plays nicely with segmentation gym, https://github.com/Doodleverse/segmentation_gym
MIT License
62 stars 12 forks source link

CODE REVIEW #15

Closed frank-engel-usgs closed 2 years ago

frank-engel-usgs commented 2 years ago

Purpose

This is my USGS review of dash_doodler. My background includes image processing and hydraulics. I have some ML background, but am not a domain expert by any means. This review reflects this.

Overview

dash_doodler is a plotly-based webapp for rapid image segmentation using a "human in the loop" supervised ML segmentation model. It has the advantage of producing rapid results with just a little human supervision, enabling a more rapid workflow than traditional supervised classification analysis. This tool has several applications qhich could benefit from segmentation, especially from a "ease of use" standpoint. Use cases include segmentation of

In general, I found dash_doodler to perform as expected. Although it is fairly slow to run on my system (not a powerful system by any means), it does work as expected. The tool provides a rapid (semi-automated) way to segment images, and worked for me for several images of various complexity. I have no major concerns about approving the use of dash_doodler. I offer some minor comments down below.

Approach for review

I see one primary use of dash_doodler as a means for rapid classification of imagery by non-ML experts. So to test this I collected six example images from a range of collection environments. Two of the examples are from perspective cameras (both low oblique), and the rest are nadir scenes from sUAS. All but one example are from river applications. The LERZFissure8.jpg example is from an active lave effusion event. To test dash_doodler, I processed each example using the default settings. Then I varied settings iteratively to try and get the best solution for the particular scene. I used a low-power laptop to process, so speeds are relatively slow, but this aligns with my justification of a primary use case being non-ML experts.

I did also look through the code to review method and syntax. Generally, since this is not my domain, I did not offer many suggestions. Mainly I attempted to verify that dash_doodler performed well and does what is says it's supposed to: offer human in the loop, quick, lightly supervised image segmentation.

Finally, I also offer a few comments on potential PII and/or security issues.

Specific comments and results of review

Installation and environment

I was able to install dash_doodler using the supplied instructions in the README.md file. I am running a conda installation in PyCharm 2021.2.2. I to create the environment, I started an Anaconda prompt and issues the commands as specified in README.md. Then, within PyCharm, I added this environment as the default Project Interpreter in the Project-specific settings. Everything ran as expected, and I encounters no errors.

I am running on a run-of-the-mill business laptop. Specs:

Processing examples

I processed 3 examples ranging from simple, to overly complex scenes. Dash_doodler did well, and I was able to obtain satisfactory results for each use case (given the constraints of the imagery). I think there is potential here for use in other user tools that need segmentation, but it will not "fix" very complicated scenes, where human input is necessary to discern where to "cut" regions due to too fine or too complex of a transition between classes.

Example 1: Androscoggin River

This is a 4K resolution image, and represents a typical output I'd expect for a Image Velocimetry application by sUAS. This particular site is immediately downstream from a waterfall, which is causing considerable surface foam on the water. I processed with 2 classes: water, land.

As expected, this image took awhile to process, due to it's 4K size, and my system's limited resources. But, there were no memory errors. Total process time was about 5 minutes. In general the result is good. There are some artifacts in the lower right that I'm not sure how they are being classified, see the mask. And there are a couple of mis-identified objects in the land class. This image probably would benefit from a "parking lot" class in addition to just land.

The original image (reduced in size for this review): androscoggin_sm

My doodles:

androscoggin_sm_annotated

Resulting mask:

androscoggin_sm_mask

Example 2: Boneyard Creek

This is an image from a security camera with a low oblique perspective view of a small creek. There are several classes. I used: water, stone, vegetation, and adcp.

The image: boneyard_sm

And the classification doodle:

boneyard_annotated

The mask on the first attempt looks decent, but there are classification errors:

boneyard_classified

Example 3: Fissure 8 2018 LERZ Effusion Event

This example is just fun. But it represents an interesting use case. I wanted to see if dash_doodler would be able to segment the cooled lava spots from the glowing hot spots. Here's the original image: LERZFissure8_sm

As you can see there is a mix of texture here that is very complex, and presents an excellent test for dash_doodler. I first tried annotating just a few of the different textures, and seeing withdash_doodler would do with the rest of the scene (my textures are redhot and cool). The resultant classification is interesting:

LERZFissure8_1

I am clearly abusing the program. I tried again using a narrower pen width and added a "cold" class, thinking it may work better:

LERZFissure8_2

I am satisfied with this result. I used a pen width of 1 for the "redhot" class. Clearly, the ML algorithm is detecting the linear patterns of the heat cracks.

Example 3: Unnamed Creek, San Antonio, Texas

This example is of a small creek near my backyard which I used to teach LSPIV techniques to my kids (good to get them into science!). I wanted to see if dash_doodler could handle a very complex scene with water, vegetation, ground and trees (wood). Here's the original image: unnamedcreek_sm

I attempted to segment with these classes: water, wood, vegetation, ground. Here's the result:

unnamedcreek_1

As expected, a poor result. I went in and "doodled" more details in for the transitions between classes, esp wood and water. I also digitiges some of the wood with a thicker pen (5) for the trunks, and a thinner pen (1) for some of the overhanging branches. The result is slightly better:

unnamedcreek_2

But clearly, I have pushed dash_doodler past it's limit.

Code specific comments

Security and other comments

frank-engel-usgs commented 2 years ago

This completes my review of dash_doodler. Please let me know if there are any questions or follow up comments. Thanks! I enjoyed doing this review, and it is a fun tool!

dbuscombe-usgs commented 2 years ago

Hi @frank-engel-usgs thanks so much for this review - really helpful - great suggestions. There are a few issues you've raised that I would like to tackle

There are other comments that you raised that I have looked into before, and decided either it is not a problem, or not worth implementing because it creates potential issues elsewhere:

Please could you share your original sized images? I would like to doodle them myself, and also would like to give the last one (unnamed creek) to collaborators to make up their own classes, for fun, to see what they come up with? Would that be ok?

Thanks again, for both reviews!! I'd be happy to repay the favor

dbuscombe-usgs commented 2 years ago

It also might interest you to know that I'm next working on a manuscript that uses Doodler to generate a large, multi-person, multi-class labeled dataset, consisting of many thousands of labeled images for training a deep neural network for segmentation. In that, we will go into the numerous details about what 'works', in terms of class sets, image sizes, image/scene complexity, etc .... in turns out, as you probably already expect, Doodling is very much an art built on top of a science! But there are some clear emerging guidelines as to what types and scales of imagery work 'best'. I'd be happy to share details when they are ready

frank-engel-usgs commented 2 years ago

@dbuscombe-usgs: yes, this was a lot of fun,. and I enjoyed the review. I have a couple of software projects in the works, so I may take you up on the offer to reciprocate ;)

I am happy to share the input files I tested (plus some I didn't mention in this review: input_images.zip

In terms of usage rights, I captured these, and you are free to use as you wish, if you cite them anywhere, just credit "Frank L Engel, USGS":

The other two are supplied images from USGS staff. If you need a photo credit for those, let me know and I'll track it down for you.

In terms of responses to my comments. I feel you properly addressed all my concerns.

Thanks again! And I'm sorry I reviewed the wrong software the firt go (although PBR_filter was also an interesting review!).

dbuscombe-usgs commented 2 years ago

Great, thanks @frank-engel-usgs

I have made a change to help with computational efficiency, consisting of a new slider button called 'number of scales'. Previously, this was hard-coded to 6. Now, 6 is the maximum, and 2 is the minimum. The lower this number, the less memory used, and the faster the solution. You may notice performance improvement of worsening, depending on how the number of scales dictates the variance /bias (overfitting/ underfitting) issue

Your imagery are an excellent test for the program. Every new image set is an opportunity for improvement. I doodled one of images 'unnamed creek', to illustrate the effects of a few hyperparameters, starting with the 'number of scales' argument. With just two classes (water and land)

Default parameters Screenshot from 2021-09-30 12-49-43

'too much red' in the background, and the creek to the right of the image

Increase number of scales from 3 to 4 Screenshot from 2021-09-30 12-54-37

no improvement - if anything, it is worse

Decrease number of scales from 3 to 2 Screenshot from 2021-09-30 12-55-51

Clearly, changing number of scales are not improving the result.

keep # scales = 2, and add doodles in strategic places

Screenshot from 2021-09-30 12-59-14 Better. And quick, too, because scales = 2. It seems like this could be improved yet further by adding and removing doodles. This image seems to require an unusually dense doodling! I think that is because of the very oblique perspective with the thin creek in the background, and the complexity of the vegetation

many more doodles (full disclosure, I now use a stylus) Screenshot from 2021-09-30 13-16-52

Still not an optimal segmentation, but in this case, the number of scales being set to the minimum (2) helped me arrive at a better solution faster. It was less onerous to wait for the 'solution', so hopefully that is an improvement

Add a class (1-water, 2-woody veg, 3-herbaceous veg and sediment)

Screenshot from 2021-09-30 13-35-45 Works ok first time but there are some water blobs on the herbaceous veg/sediment. Overall it seems that adding a class helps Doodler exploit the strong spatial signature in class distribution across the image (trees at the top of the image, sediment bank at the bottom, water in the middle)

(I also fixed the behavior when downloading the default images. Now it does this by default, but only if there are no jpeg files already in the assets folder. Other suggested fixes such as the logging behavior are coming.)

I may play with some other images and post them here