allenai / satlas

Apache License 2.0
184 stars 19 forks source link

Preprocessing property dataset #2

Closed samar-khanna closed 1 year ago

samar-khanna commented 1 year ago

Thank you for releasing satlas, it's a great resource!

I have a question about running the to_dataset command for properties. It seems that symlinks are not created and instead new images are written, which makes this processing step very slow. Is there a reason symlinks are avoided for this one?

favyen2 commented 1 year ago

Symlinks are not used for properties because the input images for properties are centered at the point or polygon associated with the property. We didn't want to put logic in the data loader to extract a crop from multiple images, so instead it is done during pre-processing. A new metadata file with up to 100K property examples per task will be released soon (by Aug 15) which should make this more feasible; our pre-training was done with 100K property examples not the 1M file currently included.

samar-khanna commented 1 year ago

Thank you! I would greatly appreciate if you could update the issue when the 100k property examples dataset will be created :)

favyen2 commented 1 year ago

The 100K property files have been added and I updated the documentation at https://github.com/allenai/satlas/blob/main/SatlasPretrain.md#prepare-datasets please let me know if you encounter further issues.