Closed henare closed 9 years ago
Yes — we only show start or end dates that differ from the containing term, so that they stand out.
Is this just a concern that something might have gone wrong, or do you actually think they should be included?
Is this just a concern that something might have gone wrong, or do you actually think they should be included?
Both. In the Popolo data it seems to make sense to include start_date
or am I missing something?
In the Popolo data it seems to make sense to include start_date or am I missing something?
Currently I don't include it because it's mostly unnecessary duplication. Imagine you had JSON where the start_date
was replicated on every membership, and then you discovered that it the term actually started a day later than you previously thought. Now you need to update that value in hundreds of places, rather than one.
I'm open to persuasion that there should be separate "compact" and "expanded" JSON representations, though. (Though if we were to do that, I think I'd take this opportunity to switch the core underlying Popolo to JSON-LD, and then generate multiple plain JSON versions from that).
I'm curious as to why this is (or might be) an issue for you, though…
It's not a major issue but I did expect to see data in that field for each membership record.
Here's a concrete example of where I was going to use it. When I'm importing the memberships into TVFY if I could just always rely on a start_date
field being there it would be slightly easier than having to check the start of the "legislative period" if that field is not present so I can fill in that date (which I do need for every membership record).
Currently I don't include it because it's mostly unnecessary duplication. Imagine you had JSON where the start_date was replicated on every membership, and then you discovered that it the term actually started a day later than you previously thought. Now you need to update that value in hundreds of places, rather than one.
Yes but surely that's automated?
if I could just always rely on a start_date field being there it would be slightly easier than having to check the start of the "legislative period"
What would you do for a case where we don't know the start date of the legislative period?
Yes but surely that's automated?
If you mean simply that it's not a matter of hand-editing JSON, then, sure. If mean that the JSON is generated from other data, and it would just be a matter of changing that, then that's also true, but conceptually we're treating the JSON as the 'definitive' format, from which other formats get generated (e.g. the per-term CSVs are generated from the JSON, rather than the sources), so it should be able to stand alone as a finished form. (Though, as mentioned above, I'm leaning towards making that 'definitive' form be JSON-LD rather than JSON, largely because it copes with multi-lingual text much better, in which case much of this discussion would disappear anyway as I could spit out both versions, or you could even tweak the framing JSON that transforms the origin and get your own custom version.)
Conceptually I'm also a little hesitant to change a core output format without deeper thought, as it seems like a symptom of the wider questions we've faced around mutually-exclusive forms of Popolo, and whether people should write tools simply based on a specific representation or on the standard (albeit a standard that's still in flux, and potentially still too flexible to actually be able to write sufficiently generic tools around…) Ideally tools that consume EP data should also be able to understand other flavours of Popolo too — though in practice that should really be through standard libraries that don't actually exist yet either…
This is all a quite long-winded way of saying "I'm not really sure", and the answer is as much philosophical as technical!
When I'm importing the memberships into TVFY if I could just always rely on a start_date field being there it would be slightly easier than having to check the start of the "legislative period" if that field is not present so I can fill in that date (which I do need for every membership record).
Perhaps a useful solution here would be to have a standard library for unpacking EveryPolitician JSON, so you're not dealing with the raw JSON at all? Then it would know how to DRTR for this (and any other odd or surprising edge-cases), and be kept up to date with future changes etc? I'm a little too busy atm to put that together right away, but if it sounds like a plausible approach, then perhaps you could start by maintaining that sort of separation in your own import script, and it could evolve into a standalone gem that I'd be willing to take over maintenance of…
I'll start by saying that all of what you say makes sense and not having this is not stopping me from reusing this data :)
What would you do for a case where we don't know the start date of the legislative period?
Then I would expect that field not to be present.
Perhaps a useful solution here would be to have a standard library for unpacking EveryPolitician JSON, so you're not dealing with the raw JSON at all?
I can definitely see how that would be useful. I'll see how my importer evolves and if it makes sense to extract this logic then I'll keep that in mind.
What would you do for a case where we don't know the start date of the legislative period?
Then I would expect that field not to be present.
The more I think about this, the more I suspect this is another of those cases where the inability to cleanly express unknown values in JSON leads to problems. There are potential ambiguities no matter which way this is expressed, I think…
Maybe I really do need to just bite the JSON-LD bullet…
I can definitely see how that would be useful. I'll see how my importer evolves and if it makes sense to extract this logic then I'll keep that in mind.
Even if you don't go that way yourself, pointing me at your importer could be useful as a starting place when I get time to look at it, as I strongly suspect having something like this would be useful in other places too, and it's better to get it built now when things are still relatively simple, and then evolve it over time. I really want to start getting better m18n of the data (separately from the site) in place in the next few weeks, and being able to hide a lot of that behind a library that DTRT would help a lot, I think.
...pointing me at your importer could be useful as a starting place when I get time to look at it...
Easy: https://github.com/openaustralia/publicwhip/compare/ukraine#diff-704369f3d959dd034ea368596659a537
Most start at the start of the parliamentary term on 2014-11-27. That's in the scraper but it doesn't show up in EP. Is this expected?
http://everypolitician.org/ukraine/verkhovna-rada/term-table/8.html
https://morph.io/openaustralia/ukraine_verkhovna_rada_deputies