Closed cholmes closed 4 months ago
Ok, and now I've got a much more complicated one, see https://phys-techsciences.datastations.nl/dataset.xhtml?persistentId=doi:10.17026/dans-xy6-ngg6
This has two datasets that will be part of fieldscapes. It looks like it's a 100+ individual geopackages for cambodia areas and vietnam areas. So the ideal 'pre-processing' would need to combine them all into one geopackage.
Though I also could just download them all, combine them, and put them on source cooperative, and then have the converter use that - then we wouldn't need to build pre-processing logic into the converter.
(I have memories of some other field boundary dataset that was really weird, but can't find it now, but I'm sure there are some other examples)
Yeah, that sounds reasonable. There are some existing solutions in the implementations, which may solve it for you:
at
: Show ZIP extraction and picking a file from it (that's the pre-processing pretty much, we can check whether this needs a separate abstraction point)de_bb
: Shows how to pick a specific shapefile from a ZIP through the layers optionWe could certainly also allow a list of URIs and concatenate them, but this only works for simpler cases. Once it starts with different projections etc, it will need custom code.
On the other hand, look at the at
datasets implementation. It's relatively simple to do the extraction and then to pick a file.
Forgot to close, this is solved now.
When working on #31 I tried to directly download the data from the eurocrops source https://zenodo.org/records/8229128/files/FR_2018.zip But the file structure in the downloaded zip is:
Geopandas doesn't like that. I just ended up calling the local, unzipped file, but now there's not a way for people to get the data from the source.
I think it's ok for now, I put in the flatgeobuf from source https://data.source.coop/cholmes/eurocrops/unprojected/flatgeobuf/FR_2018_EC21.fgb which I think should work, though I've not yet tested the 6 gig download in one go. But it seems like it'd be better to be able to have some way to handle this? Perhaps another 'block' in the template where you can put some python code to do 'pre-processing'?