Closed yellowcap closed 4 months ago
@yellowcap will this be run/saving at the patch level? If so, may be an easy, ready-made set that our app team can use, so just want to flag to them.
The earth-text team needs the tile level embeddings, but we can create the patch level too, that is a good idea.
That would be great, thank you!
Tagging here that this was done, as shared on chat:
We created the embeddings for the california chips using model v0.2. Under s3://clay-text/california-worldcover-chips/
you have
embeddings_v0.2
: the embeddings per chip, of length 768 on each chippatch_embeddings_v0.2
: the embeddings per patch, of shape 16x16x768 on each chiposm
: the osm geometries of each chip (all of them, unfiltered)esaworldcover-2020
: the corresponding chips with esa world cover classes, of shape 256x256 on each chipNote also that I use sometimes this function to generate unique identifiers for each chip. It creates a hash of the string representation of the geometry, so I can get unique identifiers of any shape.
the file california-worldcover-chips-osm-multilabels.parquet
contains the metadata with the geometry, identifiers, osm tags and multilabel one hot encodings of each chip
Done
Once we have the new model we can run embeddings for all of the ~100k tiles over California.
Depends on https://github.com/Clay-foundation/model/issues/190