hlydecker commented 1 year ago

Context

Tree annotation is absolutely required for minimum functionality.

We do not currently have any tree segmentation data.

We need to implement a way to quickly generate tree annotation data to use when training our models.

Tasks

[x] Familiarise yourself with usage of segment-geospatial
[x] Test using grounded segment-geospatial on the chatswood sample dataset to segment trees.
[x] Identify the best prompt to use here: trees, plants, vegetation, etc.? Trees will probably be the best, and we might want to also try "grass" as a different class.
[x] Identify the ideal box_threshold and text_threshold parameters to return clean annotations for the following classes: trees, grass. Trees are the first priority, so focus on these first. Once trees are good we can start working on grass / other non tree vegetation.
[x] Once you have identified the best prompts + hyperparameters for using SAM to annotate our trees + maybe grass, quantify the computational requirements. What sort of compute resources do we need to run inference? How much compute time is needed for each unit of land area? We can then take this information and calculate costs for the entire Sydney area or at least the area of annotated SA1s from #7
[ ] Implement a script for automated tree + grass annotation using segment-geospatial and deploy for the target area (likely SA1s from #7 )

mauch commented 1 year ago

I've had a look at running segment-geospatial on a small region of Chatswood with zoom=21. I split the region into 20 1000x1000 tiles and ran segment-geospatial with a range of parameters.

It seems the results are strongly dependent on the tile size you use for a given zoom level - so at zoom=21 1000x1000 tiles seem to work. I havent tried smaller tiles yet since it takes a lot longer to run.
Changing the value of text_threshold has made no difference at all for me with this dataset, it might be related to using tree as the text prompt - I haven't tried anything else like vegetation or grass yet.
I ran through box_threshold from 0.01 to 0.40 in increments of 0.01 for text_threshold=0.24 and to my eye box_threshold ~ 0.25 gives the best accuracy - but has a few false negatives, the number of false negatives increases as you make box_threshold smaller.

Result with box_threshold=0.25: b=0 25,t=0 24,ts=1000

With box_threshold=0.20: b=0 20,t=0 24,ts=1000

With box_threshold=0.30: b=0 30,t=0 24,ts=1000

mauch commented 1 year ago

The current make_mask.py script does a pretty good job (as seen in the comment above) but has some limitations with over predicting tree coverage in certain areas of the input images. This is a result of the GroundingDINO step in the prediction preferentially putting predicted boxes around the entire area of an input tile, and the subsequrent SAM prediction labelling most of the tile area as a tree.

For example: 4154_nochange

A zoomed in example of a single tile with a box around its entire area is: tile_1_0_mask

A simple solution to this problem is to just reject boxes with area nearly the size of the input image (tile) for prediction. For example when rejecting boxes with area >95% of the input image: 4154_full

There is a clear downside to this though - because rejecting ALL boxes with area >95% of the input image also rejectes areas with large swathes of 'tree' covering the entire area of the input tile (bottom right of the above image). I have tried solving this by creating a two level box_threshold parameter, whereby I only reject boxes with a large area that have a 'tree' detection probability <0.35 (and keeping the standard box_threshold=0.23 for smaller boxes). This looks significantly better - and allows for a smaller box reject threshold of 80% while still keeping large areas of trees: 4154_0 8

mauch commented 1 year ago

There are two GroundingDINO models described in its paper, one with a SwinT backbone and the other with SwinB. The Segment Geospatial package uses SwinB by default and I thought I would try the SwinT model to see if it makes any difference to the results. I ran SwinT model to annotate a variety of images and inspected the results to see what difference it made. Though the results are not identical with the two models, there is no clear winner. Both models tend to achieve roughly similar results.

Eg: 1: SwinB 1025_full

2: SwinT 1025_test

Sydney-Informatics-Hub / aerial-annotation

Automatic tree annotation #10

Context

Tasks