Closed yoimnik closed 5 years ago
@thekaveman requesting review
Thanks for this.
I feel like there might be some cross-over concern with some of what ProviderDataLoader
is doing in load_from_source
here https://github.com/CityofSantaMonica/mds-provider/blob/dev/mds/db/load.py#L138.
I've been wanting to cleanly refactor some of this functionality but haven't had a chance yet to dig into the specifics or think about what it would look like.
Your fix looks good.
One additional concern is that this change essentially trusts that the rest of the files also have the same version as the first. Do you have any thoughts on this? Potential workaround is only extend data
when the version
matches the first.
hmm, i didn't put too much thought into this earlier
i don't think we should only extend data
when the version
matches the first. this is a bit of implied behavior and could end up confusing a developer down the road.
i think it's reasonable to expect the developer to know what version of the payload they're working with. if we're using semantic versioning (ie. major.minor.patch
):
payload includes major version differences
) because there could be data schema changes, and it's likely that upstream sql isn't handling multiple schemas
- if they're working with a list of payloads that have a major version difference, then this should throw with an error message
- how do we handle minor version differences? are schema changes allowed in minor version bumps in MDS?
I like the idea of throwing an error on a mismatch instead of silently reading only matching data.
In general, a MINOR version change in MDS should only include non-breaking changes (adding a non-required field for example). The only complication is with MDS being early in development still (pre-1.0.0
), the MINOR version changes are potentially breaking.
I like the idea of throwing an error on a mismatch instead of silently reading only matching data.
@thekaveman added this :)
A suggestion on the implementation of this:
Capture the version from the first page
, and compare each page["version"]
with that initial one, throwing the first time you find a mismatch. Something like:
data = []
version = payload[0]["version"]
for page in payload:
if page["version"] == version:
data.extend(page["data"][record_type])
else:
raise ValueError("Version mismatch detected.")
As of your most recent commit, I think the version check is not at the correct indentation level.
if we specify a file path containing a list of payloads, which is how the ingest script outputs, then the library doesn't handle the case that the file may be a list of payloads