GreenInfo-Network / seattle-building-dashboard

Energy benchmarking for Seattle
https://greeninfo-network.github.io/seattle-building-dashboard/
ISC License
1 stars 0 forks source link

CARTO viability and altnerative assessment #63

Closed danrademacher closed 7 months ago

danrademacher commented 10 months ago

Evaluate the long-term stability of the existing data architecture in Carto and suitability of substitutes

I think we can dispense with viability by just writing something like this in a report:

CARTO has deprecated the version of the platform that hosts the data now, and there's no clear path to their latest (paid) product without a full rewrite. Though the company has said they have no plans to retire free accounts on this old platform, there's also no service guarantee. They recently added a 10 dataset limit, which suggests they are paying attention and ratcheting down resource use over time. this suggests the platform is precarious as a long-term home. That said, the company has in the past been responsive to requests for a little more time on other migrations, so we wouldn't expect a sudden stoppage.

And then we look at alternatives for providong some version of Postgres-as-a-service, our own RDS, or some other way to provide data.

tomay commented 10 months ago

If we ditch Carto, then we'll probably want or have to (?) ditch cartodb.js as well, since the two are tightly integrated, and that library is no longer supported (and doesn't include any post Leaflet 1.0 features).

tomay commented 10 months ago

In terms of alternatives to Carto, I've been very curious about Martin: https://github.com/maplibre/martin

One neat thing is that you can pass query params directly with the URL: https://maplibre.org/martin/sources-pg-functions.html

Here's a demo: https://martin.maplibre.org/

Not sure how that works dynamically, we're not typically updating a tile URL after it loads on the page. Also worth noting that this is strictly a vector tile server.

Anyhow, the promise is that you can use one PostGIS data store for both tabular data and vector tiles

tomay commented 10 months ago

I set up an EC2 running Martin, and it works great, in localhost.

Here's an example serving vector tiles from the EC2 to MapLibre:

S5DNn0aobc

Unfortunately, Martin doesn't speak HTTPS, which means it won't work on GitHub pages (or likely any host) without additional Apache config or setup that is beyond me

But people are using this in production. How?

tomay commented 10 months ago

Working now https://greeninfo-network.github.io/martin-tileserver/#14.11/47.60677/-122.33032

Made a couple of changes to make this work:

  1. Added the following to VirtualHost *:443 to proxy requests to/from (?) SSL, and include CORS headers:
    ProxyPass / http://localhost:3000/
    Header set Access-Control-Allow-Origin "*"
  2. Run martin via Screen so it stays running

I'm sure there are better, more reliable ways of doing this, but it's good enough for the purposes of this demo

tomay commented 9 months ago

Note for myself: Don't forget to remove the following resources when we decide we've have enough demo of the Martin tile server

danrademacher commented 9 months ago

One challenge with any new replacement for CARTO is that it will also have an uncertain future. I suppose with Martin, if we are running it on our own server, then maybe that's mitigated somewhat.

In general, we might summarize that approach as "GreenInfo runs Postgres on AWS" and whether we use Martin or something else, the idea would be we maintain a backend that Seattle-hosted code talks to.

Other angles to consider here:

What if this was just a client-side application?

The entire dataset for the application is a bit large for that -- 8 MB for all the tabular data and 2 MB for the building outlines:

image image

So 10MB and growing by some amount each year that data gets added.

What if the data were stored in per-year files and we loaded the other years of data only when needed?

Here's an example query for 2021 data: https://cityenergy-seattle.carto.com/api/v2/sql?q=SELECT+ST_X(the_geom)+AS+lng%2C+ST_Y(the_geom)+AS+lat%2C*+FROM+seattle_buildings_2021_update+WHERE+year%3D2021

This is pretty large at 5MB! The same data as CSV is only 1.8 MB.

But the building report presents some challenges, since that includes data from multiple years, like this: https://cityenergy-seattle.carto.com/api/v2/sql?q=SELECT+ST_X(the_geom)+AS+lng%2C+ST_Y(the_geom)+AS+lat%2C*+FROM+seattle_buildings_2021_update+WHERE+id=351%20AND(year=2015%20OR%20year=2016%20OR%20year=2017%20OR%20year=2018%20OR%20year=2019%20OR%20year=2020%20OR%20year=2021)

So there we are pulling all building data for a single building from multiple years. If we split CSV data by year and each year was ~2MB, that might that 14MB of download!

I find it a little hard to resist the temptation of a client side app given how infrequently this data changes (once a year).

Still, it seems like it would require some heavy preprocessing like:

Like any move away from CARTO that retains the existing frontend and functionality, this would require a substantial rewrite of the application. The one benefit here is that there would be no dependence on any database server.

Revisit Tableau and Microsoft PowerBI

Both of these options would require software licenses, but Rebecca has mentioned in the past that Seattle has licenses for one or both of these -- they are both made by companies sorta based in Seattle (Tableau started there but was acquired by Salesforce).

PowerBI has maps, and here's a rundown of their map options. My read of this is that Seattle would have to let go of building footprints and use only dots for the map in PowerBI. There's some hacky stuff at the end of this about custom shapes -- which are experimental, have no basemap (?!), only in preview stage (not full release), and generally sound like they are not ready for prime time.

Tableau has a lot deeper spatial functionality, as documented here, and much more, described here https://www.tableau.com/blog/guide-to-mapping-in-tableau

In the past, mobile responsiveness has been a problem for Tableau, but they seem to have handled that https://help.tableau.com/current/pro/desktop/en-us/dashboards_dsd_create.htm

And this custom app isn't responsive anyway, so that's not a decider. Tableau does offer PDF downloads, as well as image downloads, of views of their dashboards. Seems to work, mostly, but not really print optimized.

It seems likely that a suitable replacement dashboard could be built in Tableau, but it would require throwing out the entire existing dashboard and starting from a blank slate.

One could also consider ArcGIS Online for a solution, though I think Tableau would likely offer more here than Esri.

Revisit Touchstone IQ

The turnkey solution Rebecca has mentioned is https://touchstoneiq.com/maps, which is doing the statewide mapping for Colorado: https://maps.touchstoneiq.com/colorado/

Looks like they have only a few government clients, mostly in city/county Colorado plus Hennepin County MN.

Without speaking to this vendor, we wouldn't know if they customize these for clients, to account for things like the Washington State or Seattle building standards.

danrademacher commented 9 months ago

Revisiting the idea of GreenInfo runs Postgres on AWS, I wonder if we really need tiled data and a tileserver here?

What if instead we took an approach where we have a Django application that stores the data and then we write endpoints that do the following:

  1. Accept a year and return GeoJSON of all buildings POINTS for point location map
  2. Accept a year and return GeoJSON of all buildings FOOTPRINTS for point location map
  3. Accept a building ID and return JSON of all data for the building report card

The spatial data for (2) really is 2.4 MB, which seems really manageable. But that's without attributes. Maybe this is where we need Martin as a tileserver.

tomay commented 9 months ago

Another option for a purely front-end solution would be to use S3 for PMTiles and to query the CSV. The latter would require a lambda function, but lightweight overall.

danrademacher commented 9 months ago

PMtiles plus CSV queries with Lambda seems like a solid approach for this.

I think the rub is that between Backbone.js and Cartodb.js, we're looking at a major rewrite no matter what. SAt this point, the underlying codebase and tech is 10 years old!

Hmm, I wonder if Seattle people should talk to DC people and find out what their plans are...

danrademacher commented 9 months ago

Started drafting a report here: https://docs.google.com/document/d/1STRVs7X5NjZ1GCqw8hB1YAn8_06ggao0nsug7kYEBT8/edit

danrademacher commented 9 months ago

I completed the recommendations report. @tomay if you could give it a read through, that would be great.

Then we can send this off to Mike and call this task 100% done.

tomay commented 9 months ago

Looks good to me! I fixed a few typos and added a comment or two.

Some details of the implementation probably depend on additional R&D. For example, I have no idea what conditional rendering of PMTiles looks like in Leaflet (when changing variables). And I wonder if it will be more efficient to render points as PMTiles than to repeatedly query S3.