Clay-foundation / earth-text

Adding language to Clay
Apache License 2.0
12 stars 3 forks source link

Create v0.2 embeddgins for the California tiles #7

Closed yellowcap closed 4 months ago

yellowcap commented 5 months ago

Once we have the new model we can run embeddings for all of the ~100k tiles over California.

Depends on https://github.com/Clay-foundation/model/issues/190

lauracchen commented 5 months ago

@yellowcap will this be run/saving at the patch level? If so, may be an easy, ready-made set that our app team can use, so just want to flag to them.

yellowcap commented 5 months ago

The earth-text team needs the tile level embeddings, but we can create the patch level too, that is a good idea.

lauracchen commented 5 months ago

That would be great, thank you!

brunosan commented 5 months ago

Tagging here that this was done, as shared on chat: We created the embeddings for the california chips using model v0.2. Under s3://clay-text/california-worldcover-chips/ you have

Note also that I use sometimes this function to generate unique identifiers for each chip. It creates a hash of the string representation of the geometry, so I can get unique identifiers of any shape.

the file california-worldcover-chips-osm-multilabels.parquet contains the metadata with the geometry, identifiers, osm tags and multilabel one hot encodings of each chip

yellowcap commented 4 months ago

Done