Closed vincerubinetti closed 1 year ago
And as a good practice, as new features get suggested, we should try to define a schema for what it will include before actually writing a processing script and implementing it. For example in #9, we should try to decide what fields will be in the final json before doing anything else, so you don't have to go back and update the scripts constantly.
This is to try to avoid this problem:
website will probably need rapidly changing data structures to match implementation details
Eventually, then, I guess the idea is that I would get rid of as much of my pre-processing in /data as possible, and have the database-side scripts provide as much of /public/*.json as possible, except for the Natural Earth parts of the data.
I think we've landed on this. The paper folks handle the scripts they need for the paper. I handle the scripts needed for the website, deriving it directly from the data of record on Zenodo. There may be some duplication of processing there, but I think it's worth it to reduce the workload for the paper folks, and also allows me to have greater/quicker control over the data format i need.
As an enhancement, in the future, we can try to remove some of my script code, and replace it with direct SQL queries.
From slack:
In zoom we had also talked about possibly even just having the new single database file be duplicately stored in the website repo, and have the node js script compile the info by directly making SQL queries. This is not off the table completely, but I'm now leaning toward having all of the data processing for everything be colocated so there is no duplication.
Rich, can you look into having extra scripts with your existing scripts that give me the data I need, which can be found in typescript schemas in src/data/index.ts. I will try to keep my schemas minimal and stable so I don't have to request new/updated scripts from you all the time. Hopefully you can do this very "close" to the database, with just nice SQL queries and minimal script processing. Being able to export from the database as json would be nice too, but it's also not a big deal to have one small extra step somewhere in the pipeline (database post-process script, website pre-compile script, in-browser conversion) to convert csv to json.