NYCPlanning / db-data-library

📚 Data Library
https://nycplanning.github.io/db-data-library/library/index.html
MIT License
0 stars 1 forks source link

Cache Config.compute property #394

Closed fvankrieken closed 1 year ago

fvankrieken commented 1 year ago

The compute property in Config calls Scriptor.runner() if the source for a data set is a python script. In Ingestor.translator, there are potentially 3 total calls to this property, one via compute_parsed here and then two others via Config.compute_json and compute_yml here (each of these three properties reference the compute property).

In cases where Scriptor.runner() is making web calls (such as doitt_building_footprints), this is causing 3 calls to download the same file, each time creating a new temporary download file.

doitt_buildingfootprints still failing for unrelated reasons, but at least it's failing quicker now that it's only downloading once!

fvankrieken commented 1 year ago

Doesn't seem like any field is set during running so I don't think that there's a risk of any unintended consequences here

td928 commented 1 year ago

Hey @fvankrieken and @damonmcc! I am also having issues with building footprints with data library when I come across this work you did. I guess one insight I had about building footprints is that it works fine for csv but not pgdump telling me that it probably some bad geometries in the dataset causing the gdal translation error we are seeing. But I remember this kind of thing from building footprints in the past but usually "fixed itself" after a new release but this time months have passed. Just wondering if you have any new thoughts about this issue and happy to collaborate on fixing this! Thanks!

fvankrieken commented 1 year ago

I think one of our GIS folks might have figured out part of this in terms of some preprocessing for the doitt_buildingfootprint, @damonmcc am I remembering that wrong? Do you/Jack have docs around those issues you were working on?

damonmcc commented 1 year ago

yup GIS tackled DTM preprocessing and tracked that work in this issue in the edm-overview repo: https://github.com/NYCPlanning/edm-overview/issues/891

fvankrieken commented 1 year ago

Ah that was dof_dtm and not doitt_buildingfootprints?

damonmcc commented 1 year ago

true! sounds like the same issue of geometries that data_library can't repair