colorado-cancer-center / ecco

An interactive resource for exploring cancer data in Colorado
https://coe-ecco.org
Apache License 2.0
1 stars 2 forks source link

Move locations data to backend #77

Closed vincerubinetti closed 1 month ago

vincerubinetti commented 2 months ago

Currently the data for the locations is hardcoded as geojson into the frontend (src/api). It should ideally be moved to the backend, even if hardcoded there, for consistency.

For the sake of automating the ingest of this data later in #74, see #19 for how the data was initially scraped and added to the frontend.

falquaddoomi commented 1 month ago

Hey @vincerubinetti, I've started working on this. I wanted to mention one caveat: our GIS database, PostGIS, doesn't support FeatureCollections or Features directly as geometry. Instead, you're supposed to "fully normalize" the structure by breaking out the individual elements of the FeatureCollection as tables, down to actual geometry elements (the "geometry" entry under a Feature in GeoJson), which can be stored as geometry. If you want to serve GeoJSON FeatureCollections/Features after normalizing, you do so by reconstituting that data structure when you return it from the API.

While I definitely see the benefits of fully normalizing the structure in the database, I also don't know if we'll ever need it and I would prefer to get this done quickly; we can revisit it if we find that we need it later on. Instead, I plan to store each FeatureCollection as a JSON blob in the backend, which I'd just pass off to you verbatim from the API, like how you currently load it from a JSON file in the frontend.

No action required from you, but I just wanted you to know the design decision I'm planning to take. I'm also willing to fully normalize the structure if you prefer, and I can fill you in on the pros and cons of each approach you like.

vincerubinetti commented 1 month ago

So on the frontend, it'd be easier to keep each location set as a separate entity, just because I'm now loading them individually on demand instead of the whole dataset. Also I need to assign unique markers/symbols to each location set. But it still wouldn't be a big deal to separate them out. Leaflet is also able to handle any valid geojson structure.

So, whatever is easier for you. I have some questions out of curiosity below.

Instead, you're supposed to "fully normalize" the structure by breaking out the individual elements of the FeatureCollection as tables, down to actual geometry elements (the "geometry" entry under a Feature in GeoJson), which can be stored as geometry. If you want to serve GeoJSON FeatureCollections/Features after normalizing, you do so by reconstituting that data structure when you return it from the API.

This seems like a lot of overhead. I would've thought a key selling point of PostGIS would be to make this more convenient, but perhaps not, at least for geojson structured data? Like does it not provide any utilities to help you with the normalization?

Instead, I plan to store each FeatureCollection as a JSON blob in the backend

If I understand correctly, this means you're basically bypassing PostGIS and just storing the stringified/serialized geojson in a single (string?) field? And this would presumably just be for the locations data... I'm assuming you're using the features of PostGIS for some other data/endpoints? (Basically just checking if it's still providing value in the stack).

Hopefully I'm understanding correctly.

I'd say we can stick with the assumption/constraint that we'll never have more than one type of entity in a single "feature collection", e.g. mixing the house and senate boundaries into the same geojson object. I think we'd always want to have that clear distinction in the UI anyway for clarity. As you saw, I currently have the locations in a structure essentially like { "feature-list": [feature1, feature2, feature3], "another-feature-list": [feature1, feature2, feature3] }, and I'd want to be able to get one list at a time. Whatever db schema/setup that provides that would be good with me.

falquaddoomi commented 1 month ago

Ok, sounds good; I think my current implementation in #82 matches what you asked for above.

Well, the main selling point of PostGIS is that you can store geometry and perform geometric operations on it; having it parse GeoJSON and create hierarchies of tables for you isn't something it's designed to do. It's really not a lot of work for me to implement the tables (it's like 15 lines of model definition code), but since don't need to deeply index the FeatureCollection structure at the moment, I think just storing and returning it as JSON is fine.

If you're curious, the database does actually support a "JSON" type, so it's not just a string -- it gets validated as JSON and PostgreSQL has some support for querying into JSON structures. That said, AFAIK you just give the FeatureCollection to Leaflet, so we don't really need to query it.