Purpose

This is my USGS review of dash_doodler. My background includes image processing and hydraulics. I have some ML background, but am not a domain expert by any means. This review reflects this.

Overview

dash_doodler is a plotly-based webapp for rapid image segmentation using a "human in the loop" supervised ML segmentation model. It has the advantage of producing rapid results with just a little human supervision, enabling a more rapid workflow than traditional supervised classification analysis. This tool has several applications qhich could benefit from segmentation, especially from a "ease of use" standpoint. Use cases include segmentation of

Water vs land for imagery captured by UAS or aircraft for earth process monitoring (e.g, change detection, sediment transport, image velocimetry, flood extents, etc.
Structure vs natural for segmentation of built environmental features.

In general, I found dash_doodler to perform as expected. Although it is fairly slow to run on my system (not a powerful system by any means), it does work as expected. The tool provides a rapid (semi-automated) way to segment images, and worked for me for several images of various complexity. I have no major concerns about approving the use of dash_doodler. I offer some minor comments down below.

Approach for review

I see one primary use of dash_doodler as a means for rapid classification of imagery by non-ML experts. So to test this I collected six example images from a range of collection environments. Two of the examples are from perspective cameras (both low oblique), and the rest are nadir scenes from sUAS. All but one example are from river applications. The LERZFissure8.jpg example is from an active lave effusion event. To test dash_doodler, I processed each example using the default settings. Then I varied settings iteratively to try and get the best solution for the particular scene. I used a low-power laptop to process, so speeds are relatively slow, but this aligns with my justification of a primary use case being non-ML experts.

I did also look through the code to review method and syntax. Generally, since this is not my domain, I did not offer many suggestions. Mainly I attempted to verify that dash_doodler performed well and does what is says it's supposed to: offer human in the loop, quick, lightly supervised image segmentation.

Finally, I also offer a few comments on potential PII and/or security issues.

Specific comments and results of review

Installation and environment

I was able to install dash_doodler using the supplied instructions in the README.md file. I am running a conda installation in PyCharm 2021.2.2. I to create the environment, I started an Anaconda prompt and issues the commands as specified in README.md. Then, within PyCharm, I added this environment as the default Project Interpreter in the Project-specific settings. Everything ran as expected, and I encounters no errors.

I am running on a run-of-the-mill business laptop. Specs:

Dell Latitude 7490
Intel i7-8650U CPU @ 1.90GHz (8 CPUs), ~2.1GHz
16384MB RAM
Intel UHD Graphics 620 (8256MB RAM) -- shared memory integrated chip, or in other words basically junk ;)

Processing examples

I processed 3 examples ranging from simple, to overly complex scenes. Dash_doodler did well, and I was able to obtain satisfactory results for each use case (given the constraints of the imagery). I think there is potential here for use in other user tools that need segmentation, but it will not "fix" very complicated scenes, where human input is necessary to discern where to "cut" regions due to too fine or too complex of a transition between classes.

Example 1: Androscoggin River

This is a 4K resolution image, and represents a typical output I'd expect for a Image Velocimetry application by sUAS. This particular site is immediately downstream from a waterfall, which is causing considerable surface foam on the water. I processed with 2 classes: water, land.

As expected, this image took awhile to process, due to it's 4K size, and my system's limited resources. But, there were no memory errors. Total process time was about 5 minutes. In general the result is good. There are some artifacts in the lower right that I'm not sure how they are being classified, see the mask. And there are a couple of mis-identified objects in the land class. This image probably would benefit from a "parking lot" class in addition to just land.

The original image (reduced in size for this review): androscoggin_sm

My doodles:

Resulting mask:

Example 2: Boneyard Creek

This is an image from a security camera with a low oblique perspective view of a small creek. There are several classes. I used: water, stone, vegetation, and adcp.

The image: boneyard_sm

And the classification doodle:

The mask on the first attempt looks decent, but there are classification errors:

Example 3: Fissure 8 2018 LERZ Effusion Event

This example is just fun. But it represents an interesting use case. I wanted to see if dash_doodler would be able to segment the cooled lava spots from the glowing hot spots. Here's the original image: LERZFissure8_sm

As you can see there is a mix of texture here that is very complex, and presents an excellent test for dash_doodler. I first tried annotating just a few of the different textures, and seeing withdash_doodler would do with the rest of the scene (my textures are redhot and cool). The resultant classification is interesting:

I am clearly abusing the program. I tried again using a narrower pen width and added a "cold" class, thinking it may work better:

I am satisfied with this result. I used a pen width of 1 for the "redhot" class. Clearly, the ML algorithm is detecting the linear patterns of the heat cracks.

Example 3: Unnamed Creek, San Antonio, Texas

This example is of a small creek near my backyard which I used to teach LSPIV techniques to my kids (good to get them into science!). I wanted to see if dash_doodler could handle a very complex scene with water, vegetation, ground and trees (wood). Here's the original image: unnamedcreek_sm

I attempted to segment with these classes: water, wood, vegetation, ground. Here's the result:

As expected, a poor result. I went in and "doodled" more details in for the transitions between classes, esp wood and water. I also digitiges some of the wood with a thicker pen (5) for the trunks, and a thinner pen (1) for some of the overhanging branches. The result is slightly better:

But clearly, I have pushed dash_doodler past it's limit.

Code specific comments

Licensing: If this is to be a USGS code release, please take a look at the USGS FSP Disclaimers page, where you'll find the following Software License that is required for USGS products (instead of the MIT license).

"This software has been approved for release by the U.S. Geological Survey (USGS). Although the software has been subjected to rigorous review, the USGS reserves the right to update the software as needed pursuant to further analysis and review. No warranty, expressed or implied, is made by the USGS or the U.S. Government as to the functionality of the software and related material nor shall the fact of release constitute any such warranty. Furthermore, the software is released on condition that neither the USGS nor the U.S. Government shall be held liable for any damages resulting from its authorized or unauthorized use."
It would be nice to have some image manipulation tools added to the web-interface (e.g., pan and zoom). This could aid in the doodle accuracy in cases where people are needing to get finer detail (which Irealize may be using dash_doodler beyond it's original intent).
I like the output and logging features. It would be nice to output a polygon of the masks as well, for use by other programs. For example, I could see using dash_doodler to create a smart mask of an Image Velocimetry measurement, then pass it to a method for generalizing/smoothing the mask to use as an ROI for further image processing.

Security and other comments

This app uses a local instance of the a Flask webserver. By default this will create an unencrypted local development environment. Operating a localhost webserver can be a security risk to users if they are not taking precautions. I would recommend enforcing https using SSL on the Flask server. Ultimately, I suspect this application would be deployed with Docker, or some other web service, in which case it should be automatically put through encryption, but I would double check this. At the least, I would specify/require encryption anyway to prevent issues.
An even better approach may be to abandon Flask and use Apache as the webserver instead. This would enable much better support and security.
Users should be prompted that dash_doodler will download files automatically in absence of no files supplied to /assets
The Logs feature is nice. It would be good to post the sterr and other warnings into the logs.

This completes my review of dash_doodler. Please let me know if there are any questions or follow up comments. Thanks! I enjoyed doing this review, and it is a fun tool!

Hi @frank-engel-usgs thanks so much for this review - really helpful - great suggestions. There are a few issues you've raised that I would like to tackle

slow performance. This is a major headache for low-spec machines. Right now, the number of different scales evaluated in the feature extraction portion of the code is large (15), which means 75 feature maps are extracted per image. That causes memory and performance issues, and may also lead to overfitting. I will therefore trial making that a user-adjusted parameter. Smaller number of scales will lead to faster results. In some cases the accuracy will worsen, and in others it may be the same or perhaps even improve (if overfitting is a problem)
I will revisit the automatic download (I was trying to be helpful, but may have made things more confusing)
I will look into how to pipe stderr and stout to the logs file
more guidance over doodle density -- in some of your examples, you have too few doodles. We have thought about warning the user about too few doodles, but it is impossible to generalize - its an 'art'.
I will add the disclaimer, but it won't replace the license. The license and disclaimer are compatible, in the absence of a USGS specific software license (that does not exist, to my knowledge)

There are other comments that you raised that I have looked into before, and decided either it is not a problem, or not worth implementing because it creates potential issues elsewhere:

I have spent a long time researching Docker workflows, and I have some working versions. However, Docker is problematic because dash must store the images in the local 'assets' folder. I have a version that works (great) with public S3 buckets, but creates even worse security vulnerabilities if deployed on CHS-AWS. I don't have the time or expertise to figure out Docker volumes, which is what is required I think for use on CHS-AWS. I lost several weeks to this issue already, and have given up at this point. The SSL is enforced at deployment and is trivial (handled by webservers like apache or nginx with letsencrypt for ssl), so no security vulnerability as far as I can tell. Nginx and apavhe are functionally equivalent over this, and nginx is far superior to apache in other regards, so no change. Also, as far as I can tell, SSL is not an issue if you are running on localhost, because IP is not exposed, and ports are easily locked down on unix systems. Dash doodler shouldn't be deployed from a Windows server
there are several third party libraries for conversion of raster to vector products. In my experience, keeping this software light on pre- or post-processing features will maintain ease of use and maintenance. Others can make 'add ons'. So, while I agree it is a good idea to make that an option, I likely will not implement it, but text could be added to the website/manual for users who may need help converting a raster to a polygon using GDAL, fiona/shapely, etc
zoom and pan are already enabled. They are on the toolbar that appears on the main figure window when the cursor is near the top

Please could you share your original sized images? I would like to doodle them myself, and also would like to give the last one (unnamed creek) to collaborators to make up their own classes, for fun, to see what they come up with? Would that be ok?

Thanks again, for both reviews!! I'd be happy to repay the favor

It also might interest you to know that I'm next working on a manuscript that uses Doodler to generate a large, multi-person, multi-class labeled dataset, consisting of many thousands of labeled images for training a deep neural network for segmentation. In that, we will go into the numerous details about what 'works', in terms of class sets, image sizes, image/scene complexity, etc .... in turns out, as you probably already expect, Doodling is very much an art built on top of a science! But there are some clear emerging guidelines as to what types and scales of imagery work 'best'. I'd be happy to share details when they are ready

@dbuscombe-usgs: yes, this was a lot of fun,. and I enjoyed the review. I have a couple of software projects in the works, so I may take you up on the offer to reciprocate ;)

I am happy to share the input files I tested (plus some I didn't mention in this review: input_images.zip

In terms of usage rights, I captured these, and you are free to use as you wish, if you cite them anywhere, just credit "Frank L Engel, USGS":

androscoggin.jpg
boneyard.jpg
missouriherman.jpg
unnamedcreek.jpg

The other two are supplied images from USGS staff. If you need a photo credit for those, let me know and I'll track it down for you.

In terms of responses to my comments. I feel you properly addressed all my concerns.

Thanks again! And I'm sorry I reviewed the wrong software the firt go (although PBR_filter was also an interesting review!).

Great, thanks @frank-engel-usgs

I have made a change to help with computational efficiency, consisting of a new slider button called 'number of scales'. Previously, this was hard-coded to 6. Now, 6 is the maximum, and 2 is the minimum. The lower this number, the less memory used, and the faster the solution. You may notice performance improvement of worsening, depending on how the number of scales dictates the variance /bias (overfitting/ underfitting) issue

Your imagery are an excellent test for the program. Every new image set is an opportunity for improvement. I doodled one of images 'unnamed creek', to illustrate the effects of a few hyperparameters, starting with the 'number of scales' argument. With just two classes (water and land)

Default parameters

'too much red' in the background, and the creek to the right of the image

Increase number of scales from 3 to 4

no improvement - if anything, it is worse

Decrease number of scales from 3 to 2

Clearly, changing number of scales are not improving the result.

keep # scales = 2, and add doodles in strategic places

Screenshot from 2021-09-30 12-59-14 Better. And quick, too, because scales = 2. It seems like this could be improved yet further by adding and removing doodles. This image seems to require an unusually dense doodling! I think that is because of the very oblique perspective with the thin creek in the background, and the complexity of the vegetation

many more doodles (full disclosure, I now use a stylus)

Still not an optimal segmentation, but in this case, the number of scales being set to the minimum (2) helped me arrive at a better solution faster. It was less onerous to wait for the 'solution', so hopefully that is an improvement

Add a class (1-water, 2-woody veg, 3-herbaceous veg and sediment)

Screenshot from 2021-09-30 13-35-45 Works ok first time but there are some water blobs on the herbaceous veg/sediment. Overall it seems that adding a class helps Doodler exploit the strong spatial signature in class distribution across the image (trees at the top of the image, sediment bank at the bottom, water in the middle)

(I also fixed the behavior when downloading the default images. Now it does this by default, but only if there are no jpeg files already in the assets folder. Other suggested fixes such as the logging behavior are coming.)

I may play with some other images and post them here

Doodleverse / dash_doodler

CODE REVIEW #15