Closed hlydecker closed 11 months ago
I've had a look at running segment-geospatial on a small region of Chatswood with zoom=21. I split the region into 20 1000x1000 tiles and ran segment-geospatial with a range of parameters.
text_threshold
has made no difference at all for me with this dataset, it might be related to using tree
as the text prompt - I haven't tried anything else like vegetation
or grass
yet.box_threshold
from 0.01 to 0.40 in increments of 0.01 for text_threshold=0.24
and to my eye box_threshold ~ 0.25
gives the best accuracy - but has a few false negatives, the number of false negatives increases as you make box_threshold
smaller.Result with box_threshold=0.25
:
With box_threshold=0.20
:
With box_threshold=0.30
:
The current make_mask.py
script does a pretty good job (as seen in the comment above) but has some limitations with over predicting tree coverage in certain areas of the input images. This is a result of the GroundingDINO step in the prediction preferentially putting predicted boxes around the entire area of an input tile, and the subsequrent SAM prediction labelling most of the tile area as a tree.
For example:
A zoomed in example of a single tile with a box around its entire area is:
A simple solution to this problem is to just reject boxes with area nearly the size of the input image (tile) for prediction. For example when rejecting boxes with area >95% of the input image:
There is a clear downside to this though - because rejecting ALL boxes with area >95% of the input image also rejectes areas with large swathes of 'tree' covering the entire area of the input tile (bottom right of the above image).
I have tried solving this by creating a two level box_threshold
parameter, whereby I only reject boxes with a large area that have a 'tree' detection probability <0.35 (and keeping the standard box_threshold=0.23
for smaller boxes).
This looks significantly better - and allows for a smaller box reject threshold of 80% while still keeping large areas of trees:
There are two GroundingDINO models described in its paper, one with a SwinT backbone and the other with SwinB. The Segment Geospatial package uses SwinB by default and I thought I would try the SwinT model to see if it makes any difference to the results. I ran SwinT model to annotate a variety of images and inspected the results to see what difference it made. Though the results are not identical with the two models, there is no clear winner. Both models tend to achieve roughly similar results.
Eg: 1: SwinB
2: SwinT
Context
Tree annotation is absolutely required for minimum functionality.
We do not currently have any tree segmentation data.
We need to implement a way to quickly generate tree annotation data to use when training our models.
Tasks
box_threshold
andtext_threshold
parameters to return clean annotations for the following classes: trees, grass. Trees are the first priority, so focus on these first. Once trees are good we can start working on grass / other non tree vegetation.