Clay-foundation / earth-text

Adding language to Clay
Apache License 2.0
12 stars 3 forks source link

Calfornia embeddings description #11

Open brunosan opened 5 months ago

brunosan commented 5 months ago

I’m a bit lost on #7:

  1. What input images are used? The usual Sentinels, from what dates? Are these chips saved somewhere?
  2. From the code I gather that the filename are named as worldcover... but these are not the inputs... We don’t input landcovers, we input sentinel data. It also uses rows and columns, but I don’t know how to geolocate that.
  3. The only way to geolocate the extend of a chip embedding is to cross reference the random chip_id in the folder embeddings_v0.2 with the .geoparquet california-worldcover-chips-osm-multilabels.parquet which has both chip_id , [col- row to check], and geometry, which should in most cases be a rectangle.
  4. For patches within the 3 dimensional patch_embeddings_v0.2/, we need to unroll the image into the n patches, but unclear how to do that ensuring the right order, so we can calculate the bbox of each patch.

Plotting the california-worldcover-chips-osm-multilabels.parquet I can see that indeed these are the bbox

Screenshot 2024-04-07 at 22 56 37
  1. I do see holes, places within California but without a california-worldcover-chips-osm-multilabels.parquet coverage. Are these invalid inputs (due to clouds and other errors)/
  2. I assume these are in all cases the average across all bands and band groups.

Thanks!!

cc @yellowcap