Closed cholmes closed 2 months ago
@cholmes
What I couldn't figure out with my meager python / pandas skills was how to extract nested objects.
I implemented dot notation access for nested objects in GeoJSON. Arrays won't work though. But this gives us access to area and perimeter at least. Updated the template accordingly.
I'd like to pull up area, perimeter and effective_from at the minimum, since we have definitions of those. [...] if you can do a couple examples of how to handle nested stuff I can likely take it back over.
I pulled area and perimeter. Didn't for the dates yet adue to the status of https://github.com/fiboa/timestamps-extension - still received no feedback. I guess I should send this out again to Bayer and others?!
For the rest I'm curious on thoughts on what to do - I think we could represent them faithfully in GeoParquet with nested columns, no? Or we could flatten them some for more geospatial tool interoperability.
As these are arrays which can have unlimited entries, it gets difficult. Maybe ignore them for now, add a todo and implement them later?
Centroids, iso_code and representative_point do seem like general concepts we could just put in a generic extension.
Oh yeah, I also wasn't sure if perimeter and area will always be meters
I did check the API docs and it seems perimeter is always m and area is always m². I kept the converter flexible so that we can add more if needed. Please note that area in fiboa is in hectares!
One potential very minor improvement is to allow the -i to take a .gz file and unzip.
That should already work. Did you try something like -i downloaded_file.gz|file_in_archive.json
?
Sorry for the slow response - just getting back to this now.
I implemented dot notation access for nested objects in GeoJSON. Arrays won't work though. But this gives us access to area and perimeter at least. Updated the template accordingly.
Ah, ok - great.
Didn't for the dates yet adue to the status of https://github.com/fiboa/timestamps-extension - still received no feedback. I guess I should send this out again to Bayer and others?!
Yeah, that'd be good to do.
As these are arrays which can have unlimited entries, it gets difficult. Maybe ignore them for now, add a todo and implement them later?
Sounds good.
That should already work. Did you try something like -i downloaded_file.gz|file_in_archive.json?
Ah, cool! I didn't know that was an option.
What's the status of this, @cholmes? Shall we merge for now?
So this converts the bare minimum for fiboa - id and geometry. It does work against both direct output from the API as well as from ungzipped downloads, which are geojsonl - but geopandas handles it well, so that's nice.
What I couldn't figure out with my meager python / pandas skills was how to extract nested objects. Varda uses a lot of these, see the attached file:
fieldBoundaries.json
I'd like to pull up area, perimeter and effective_from at the minimum, since we have definitions of those. I think highlighting field_id and having an extension for it could make sense. For the rest I'm curious on thoughts on what to do - I think we could represent them faithfully in GeoParquet with nested columns, no? Or we could flatten them some for more geospatial tool interoperability.
Centroids, iso_code and representative_point do seem like general concepts we could just put in a generic extension.
Oh yeah, I also wasn't sure if perimeter and area will always be meters - so we could perhaps have the migration check to make sure it's meters or try to adapt if it's other values there.
@m-mohr - if you can do a couple examples of how to handle nested stuff I can likely take it back over. Or you could run with it. For the full dataset downloads I think you can just sign up for an account (maybe it needs approval? But I'm sure you can get that), and then get them from https://fieldid.varda.ag/hub/downloads
One potential very minor improvement is to allow the -i to take a .gz file and unzip. The download for varda are geojsonl's that are gzipped. So it could be nice to directly read what people will download, instead of making them figure out unzipping. But very minor request - easy to just instruct people to unzip it.