Clay-foundation / model

The Clay Foundation Model (in development)
https://clay-foundation.github.io/model/
Apache License 2.0
299 stars 38 forks source link

Save embeddings with spatiotemporal metadata to GeoParquet #73

Closed weiji14 closed 9 months ago

weiji14 commented 9 months ago

What I am changing

How I did it

TODO in this PR:

TODO in the future:

How you can test it

To load the embeddings from the geoparquet file:

import geopandas as gpd

geodataframe: gpd.GeoDataFrame = gpd.read_parquet(path="embeddings_0.gpq")
assert geodataframe.shape == (2, 3)
print(geodataframe)
        date            embeddings                                          geometry
0   2022-12-12  [-1.1094263, 1.0212796, -0.58915687, -1.144523...   POLYGON ((93.02647 30.71001, 93.02648 30.73311...
1   2022-12-12  [-1.1253564, 1.0260286, -0.5860151, -1.1528502...   POLYGON ((93.34729 30.70955, 93.34738 30.73265...
2   2022-12-12  [-1.1190275, 1.0268829, -0.59865385, -1.147052...   POLYGON ((93.74777 30.63856, 93.74794 30.66166...
3   2022-12-12  [-1.1115837, 1.0286477, -0.60599935, -1.143061...   POLYGON ((93.80119 30.63824, 93.80138 30.66134...
4   2022-12-12  [-1.1172316, 1.0246403, -0.59833527, -1.143900...   POLYGON ((93.82790 30.63808, 93.82810 30.66118...
... ... ... ...
750 2022-12-12  [-1.11294, 1.0265714, -0.6015097, -1.1443343, ...   POLYGON ((93.40048 30.64010, 93.40057 30.66320...
751 2022-12-12  [-1.1207774, 1.029693, -0.5964609, -1.1490294,...   POLYGON ((93.45391 30.63992, 93.45402 30.66302...
752 2022-12-12  [-1.1309807, 1.0274287, -0.57653224, -1.162805...   POLYGON ((93.58748 30.63939, 93.58762 30.66249...
753 2022-12-12  [-1.1268965, 1.0305986, -0.59025705, -1.154876...   POLYGON ((93.61420 30.63926, 93.61434 30.66236...
754 2022-12-12  [-1.1171025, 1.0268872, -0.60177326, -1.146309...   POLYGON ((93.69434 30.63886, 93.69450 30.66196...

755 rows × 3 columns

If you have a newer version of QGIS, it's also possible to load the GeoParquet file directly. The below screenshot shows the bounding box locations of the 755 embeddings (1 embedding for each 256x256 chip):

image

Related Issues

Extends #56, continuation of #66.

weiji14 commented 9 months ago

There are a couple of things that can be improved as mentioned above, such as the filenaming scheme, and streamlining how the embeddings are saved to the GeoParquet file, but will merge this in first, and handle those nice-to-haves in follow-up PRs.