Closed hyperknot closed 4 years ago
I would support this change. @lazd what do you think?
I could be into this. Two things:
coronavirus-data-sources
as well, that could complicate things.2. We ideally want to move population data into the scrapers themselves
I disagree with this part. Population data will be a tiny JSON file, much better to manage centrally. I have it globally for down to state-level, for county level we can use https://eric.clst.org/tech/usgeojson/ to have it for all counties.
@hyperknot ok, I'll default to you on that. Note that we have a CSV I pulled from census data with population. Eric's GeoJSON, which we are already using for county-level GeoJSON, does not include population data.
Oh, then that whole webpage is nothinig more than shp2geojson on the shapefiles? Not impressive.
OK, for county that CSV file is perfect I believe. For the other ones, I give a JSON.
@hyperknot can we roll that CSV into your repo so it can be delivered in the same manner as the state/country-level data?
@lazd yes, I was thinking of that. Making the counties into GeoJSON + the CSV into JSON. I'm going to submit a PR for the state level ones, that one comes after.
@lazd can we do this? This submoduling is breaking master now for example, that brazil file is missing from the master repo's version.
I've found an answer that Git can now track the master branch in a submodule, so that might be a promising solution for us: https://stackoverflow.com/a/9189815/518169
This SO answer mentions how to make an existing submodule auto-update, so maybe
git submodule set-branch --branch master -- coronavirus-data-sources
would work.
With all of what I'm about to say, I'm sure that just moving the files into this repo would be the simplest. Submodules are always a hassle.
Re tracking the branch, as long as the master branch can be guaranteed to be good, seems like this could work. I can do a demo with a sample toy project if that would help move this along, eg Aparent has Achild as an existing submodule, I update Aparent to track Achild master, and then push changes to Achild master, and see how that affects the parent. I'd need to test how this works with forks and out-of-date local repos as well. git fetch --recurse-submodules
may solve all issues, I just don't know.
If this seems like a good idea to try out, perhaps someone can assign this issue to me.
There is another option which I've heard about, but have never tried: git subtree. Ref https://www.atlassian.com/git/tutorials/git-subtree. Excerpt:
Why you may want to consider git subtree
- Management of a simple workflow is easy.
- Older version of Git are supported (even older than v1.5.2).
- The sub-project’s code is available right after the clone of the super project is done.
- git subtree does not require users of your repository to learn anything new. They can ignore the fact that you are using git subtree to manage dependencies.
- git subtree does not add new metadata files like git submodule does (i.e., .gitmodule).
- Contents of the module can be modified without having a separate repository copy of the dependency somewhere else.
Drawbacks (but in our opinion they're largely acceptable):
- You must learn about a new merge strategy (i.e.git subtree).
- Contributing code back upstream for the sub-projects is slightly more complicated.
- The responsibility of not mixing super and sub-project code in commits lies with you.
As long as the subtree is only managed by a few sharp minds, it might be acceptable ... but it's yet another thing to learn, yet another thing to go wrong. With the current pace of dev and delivery, it probably just adds too much unnecessary risk.
This one was done recently, right?
yes. has been closed. @lazd
What do you think about including it in the main repo instead of submoduling int? I believe the datasources is not that big, especially if we move to country-level-ids. It's rarely changing and would speed up the development process if it was included in the main repo.