catherinesyeh / attention-viz

Visualizing query-key interactions in language + vision transformers
http://attentionviz.com/
MIT License
122 stars 15 forks source link

Using more accurate off-the-shelf model for labeling image patches #57

Open yc015 opened 1 year ago

yc015 commented 1 year ago

The annotation from FCN is okay. There is no significant error but can still be improved.

For example, image

Part of the person (highlighted in yellow in the segmentation map) was not recognized by FCN (see the bottom right part). This kind of errors can be corrected manually but it's tedious to do. Replacing FCN with a more powerful automated segmentation model may be a better approach.