Sydney-Informatics-Hub / aerial-annotation

Open source annotations tools for aerial imagery. Part of https://github.com/Sydney-Informatics-Hub/PIPE-3956-aerial-segmentation
MIT License
2 stars 1 forks source link

Automatic tree annotation #10

Closed hlydecker closed 11 months ago

hlydecker commented 1 year ago

Context

Tree annotation is absolutely required for minimum functionality.

We do not currently have any tree segmentation data.

We need to implement a way to quickly generate tree annotation data to use when training our models.

Tasks

mauch commented 1 year ago

I've had a look at running segment-geospatial on a small region of Chatswood with zoom=21. I split the region into 20 1000x1000 tiles and ran segment-geospatial with a range of parameters.

  1. It seems the results are strongly dependent on the tile size you use for a given zoom level - so at zoom=21 1000x1000 tiles seem to work. I havent tried smaller tiles yet since it takes a lot longer to run.
  2. Changing the value of text_threshold has made no difference at all for me with this dataset, it might be related to using tree as the text prompt - I haven't tried anything else like vegetation or grass yet.
  3. I ran through box_threshold from 0.01 to 0.40 in increments of 0.01 for text_threshold=0.24 and to my eye box_threshold ~ 0.25 gives the best accuracy - but has a few false negatives, the number of false negatives increases as you make box_threshold smaller.

Result with box_threshold=0.25: b=0 25,t=0 24,ts=1000

With box_threshold=0.20: b=0 20,t=0 24,ts=1000

With box_threshold=0.30: b=0 30,t=0 24,ts=1000

mauch commented 1 year ago

The current make_mask.py script does a pretty good job (as seen in the comment above) but has some limitations with over predicting tree coverage in certain areas of the input images. This is a result of the GroundingDINO step in the prediction preferentially putting predicted boxes around the entire area of an input tile, and the subsequrent SAM prediction labelling most of the tile area as a tree.

For example: 4154_nochange

A zoomed in example of a single tile with a box around its entire area is: tile_1_0_mask

A simple solution to this problem is to just reject boxes with area nearly the size of the input image (tile) for prediction. For example when rejecting boxes with area >95% of the input image: 4154_full

There is a clear downside to this though - because rejecting ALL boxes with area >95% of the input image also rejectes areas with large swathes of 'tree' covering the entire area of the input tile (bottom right of the above image). I have tried solving this by creating a two level box_threshold parameter, whereby I only reject boxes with a large area that have a 'tree' detection probability <0.35 (and keeping the standard box_threshold=0.23 for smaller boxes). This looks significantly better - and allows for a smaller box reject threshold of 80% while still keeping large areas of trees: 4154_0 8

mauch commented 1 year ago

There are two GroundingDINO models described in its paper, one with a SwinT backbone and the other with SwinB. The Segment Geospatial package uses SwinB by default and I thought I would try the SwinT model to see if it makes any difference to the results. I ran SwinT model to annotate a variety of images and inspected the results to see what difference it made. Though the results are not identical with the two models, there is no clear winner. Both models tend to achieve roughly similar results.

Eg: 1: SwinB 1025_full

2: SwinT 1025_test