Clay-foundation / model

The Clay Foundation Model (in development)
https://clay-foundation.github.io/model/
Apache License 2.0
299 stars 38 forks source link

Generate embeddings from CLAYModule trained with latlon/time encodings #96

Closed weiji14 closed 7 months ago

weiji14 commented 8 months ago

What I am changing

How I did it

TODO in this PR:

TODO in the future:

How you can test it

  1. Ensure you have access to the 13-band GeoTIFF data files on s3://clay-tiles-02/02/
  2. Download the pretrained model from s3://clay-model-ckpt/v0/mae_epoch-02_val-loss-0.52.ckpt to the checkpoints/ folder.
  3. Run the following in a bash shell:
    python trainer.py predict --ckpt_path=checkpoints/mae_epoch-02_val-loss-0.52.ckpt \
                          --trainer.precision=bf16-mixed \
                          --data.data_dir=s3://clay-tiles-02/02/48MYU \
                          --data.batch_size=32 \
                          --data.num_workers=16

To load the embeddings from the GeoParquet file:

import geopandas as gpd

geodataframe: gpd.GeoDataFrame = gpd.read_parquet(path="48MYU_20180813_20210424_v001.gpq")
assert geodataframe.shape == (823, 4)  # 823 rows, 4 columns
print(geodataframe)
      index                                         source_url        date  \
0         0  s3://clay-tiles-02/02/48MYU/2018-08-13/claytil...  2018-08-13   
1         1  s3://clay-tiles-02/02/48MYU/2018-08-13/claytil...  2018-08-13   
2         2  s3://clay-tiles-02/02/48MYU/2018-08-13/claytil...  2018-08-13   
3         3  s3://clay-tiles-02/02/48MYU/2018-08-13/claytil...  2018-08-13   
4         4  s3://clay-tiles-02/02/48MYU/2018-08-13/claytil...  2018-08-13   
...     ...                                                ...         ...   
1212   1212  s3://clay-tiles-02/02/48MYU/2021-04-24/claytil...  2021-04-24   
1213   1213  s3://clay-tiles-02/02/48MYU/2021-04-24/claytil...  2021-04-24   
1214   1214  s3://clay-tiles-02/02/48MYU/2021-04-24/claytil...  2021-04-24   
1215   1215  s3://clay-tiles-02/02/48MYU/2021-04-24/claytil...  2021-04-24   
1216   1216  s3://clay-tiles-02/02/48MYU/2021-04-24/claytil...  2021-04-24   

                                             embeddings  \
0     [0.013126503, -0.031934112, 0.0054517575, 0.00...   
1     [0.01362492, -0.03131817, 0.005478967, 0.00358...   
2     [0.013637519, -0.03169147, 0.0055654137, 0.003...   
3     [0.013152077, -0.027163014, 0.007045647, 0.000...   
4     [0.007802248, -0.018802581, 0.0039559323, -0.0...   
...                                                 ...   
1212  [-0.0010275859, -0.005840208, -0.0011097308, -...   
1213  [-0.000579659, -0.004794828, -0.001176401, -0....   
1214  [0.00043649378, -0.004590468, -0.0011525226, -...   
1215  [-0.0012016017, -0.002848133, -0.0016901258, -...   
1216  [-0.00092684187, -0.0075725354, -0.0019668005,...   

                                               geometry  
0     POLYGON ((106.85102 -5.47164, 106.85088 -5.425...  
1     POLYGON ((106.89721 -5.47150, 106.89707 -5.425...  
2     POLYGON ((106.94341 -5.47135, 106.94326 -5.425...  
3     POLYGON ((106.98960 -5.47120, 106.98945 -5.424...  
4     POLYGON ((107.03580 -5.47104, 107.03564 -5.424...  
...                                                 ...  
1212  POLYGON ((107.59432 -6.39429, 107.59409 -6.348...  
1213  POLYGON ((107.64057 -6.39406, 107.64033 -6.347...  
1214  POLYGON ((107.68682 -6.39382, 107.68658 -6.347...  
1215  POLYGON ((107.73306 -6.39357, 107.73282 -6.347...  
1216  POLYGON ((107.77931 -6.39333, 107.77906 -6.347...  

[1217 rows x 5 columns]

Related Issues

Towards #3

weiji14 commented 7 months ago

Still many things that could be improved, such as sharing duplicated code between model_vit.py and model_clay.py, but will merge in to main first for the first release.