Closed weiji14 closed 9 months ago
There are a couple of things that can be improved as mentioned above, such as the filenaming scheme, and streamlining how the embeddings are saved to the GeoParquet file, but will merge this in first, and handle those nice-to-haves in follow-up PRs.
What I am changing
How I did it
In the LightningModule's
predict_step
, usegeopandas
to create a GeoDataFrame with three columns - date, embeddings, geometry. A sample table would look like this:The date is stored in Arrow's
date32
format, embeddings are inFixedShapedTensorArray
(TODO), and geometry is inWKB
.Each row would store the embedding for a single 256x256 chip, and the entire table could realistically store N rows for an entire MGRS tile (10000x1000) across different dates.
TODO in this PR:
TODO in the future:
How you can test it
data/
folder, and then run:embedding_0.gpq
file under thedata/embeddings/
folderpython trainer.py predict --help
To load the embeddings from the geoparquet file:
If you have a newer version of QGIS, it's also possible to load the GeoParquet file directly. The below screenshot shows the bounding box locations of the 755 embeddings (1 embedding for each 256x256 chip):
Related Issues
Extends #56, continuation of #66.