Closed morchickit closed 4 years ago
@morchickit the date formats are inconsistent in the bulk download because they are inconsistent in the data; GrantNav doesn't modify the data at all, only reflects back what the publishers publish. Therefore, I don't think this is an incorrect design decision - it is consistent with the intention of GrantNav to be a mirror on the data (awkward though that is!)
Hearing that people are having issues with the data, though, I wonder what additional support we could provide through other tooling / advice? It's very easy to remove the additional data from a date-time with any number of command-line tools or text editors. What tools do the grantmakers you were working with like to use? If it's Excel, then DATEVALUE should help - see https://stackoverflow.com/questions/4896116/parsing-an-iso8601-date-time-including-timezone-in-excel for what I dug out. Could this go into some of our guidance?
If this presents issues in the GrantNav download, then it's likely to present issues for other data users. Should we consider a change to the standard to only allow one date format?
I am aware that GratnNav is doing so because of the standard... :-)
I have used with Gratnmakers yesterday DATEVALUE (that needs really complicated formulas for Z), OpebRefine and I even showed them how Tableau cleans it easily. However, most of them just need it easily to work on in Excel, and the conversion with DATEVALUE takes a lot of time which I believe we can save for them in GranNav.
We already changed the standard to different time formats to allow different systems output, so I am not sure this is where we want to go now. @stevieflow - thoughts?
Anyway, we know now that the GrantNav users will found this conversion helpful, so can we help them in the app for that? We do enrichment of geography, so I don't see why having one date format in the download should be an issue?
Its also worth noting that most of these datetimes are coming from flatten-tools conversion of Excel dates. One approach would be to change flatten-tool.
We talked about this at our catchup today.
If we can make flatten-tool choose a date format that Excel likes then that'll help in a lot of cases.
We could also add an augemented data column next to the original date column, perhaps?
We can. When we can fix it? We are getting more feedback about it, so it will be good to know when can we expect a fix.
I've added https://github.com/OpenDataServices/flatten-tool/issues/220 for the flatten-tool part of this work. We probably actually want to do both the flatten-tool mod and the augmented data column - as the latter offers more guaranteed consistency.
@BibianaC Thanks for starting to look into this. The flatten-tool part of the work I think we should do as-and-when; it would be a useful feature but not required for this.
In GrantNav, we should add another column to the CSV export (in the same way as we provide some of the augmented data in additional columns) which contains the date in a form that Excel understands. I'm fairly sure that'll be the YYYY-MM-DD form of ISO 8601, but we should test how well Excel can work with that. Obviously, I'd prefer to keep using ISO 8601 if possible, but as this is outside of the standard, non-standard formats are acceptable. I think the only requirement I'd put is that it should be an unambiguous one (so DD/MM/YY is out, but "DD Month YYYY" is probably acceptable.
Here's a dev deploy with extra columns for dates only, in YYYY-MM-DD ISO format: http://current.acceptable-license-valid.466-date-only.grantnav-dev.default.threesixtygiving.uk0.bigv.io/
I don't have a copy of Excel handy, so could someone else check if this works correctly?
In meetings the whole day, I will check it later this afternoon. :)
On Mon, 28 Jan 2019, 13:23 Steven Flower <notifications@github.com wrote:
Assigned #466 https://github.com/OpenDataServices/grantnav/issues/466 to @morchickit https://github.com/morchickit.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/OpenDataServices/grantnav/issues/466#event-2100743544, or mute the thread https://github.com/notifications/unsubscribe-auth/ADHzu6NaBl55E_Ru9b7tPPBDKP_aD7Uhks5vHvnRgaJpZM4VBCb3 .
Can we give the column a different name than award date (date only), something that will show it was modified? And it works well Ben!
On Mon, 28 Jan 2019 at 14:45, Mor Rubinstein < mor.rubinstein@threesixtygiving.org> wrote:
In meetings the whole day, I will check it later this afternoon. :)
On Mon, 28 Jan 2019, 13:23 Steven Flower <notifications@github.com wrote:
Assigned #466 https://github.com/OpenDataServices/grantnav/issues/466 to @morchickit https://github.com/morchickit.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/OpenDataServices/grantnav/issues/466#event-2100743544, or mute the thread https://github.com/notifications/unsubscribe-auth/ADHzu6NaBl55E_Ru9b7tPPBDKP_aD7Uhks5vHvnRgaJpZM4VBCb3 .
-- Mor Rubinstein Data Labs Manager 360Giving http://www.threesixtygiving.org/ Twitter - @morchickit https://twitter.com/Morchickit Cell: +44(0)7904191881 / Office: 020 3752 5775 Skype: mor.rubinstein 360Giving is a company https://beta.companieshouse.gov.uk/company/09668396 limited by guarantee and a registered charity http://beta.charitycommission.gov.uk/charity-details/?regid=1164883&subid=0 . Read our privacy notice http://www.threesixtygiving.org/privacy/ to find out how we collect and use personal data.
[image: cid:image001.png@01D122F0.5ED3A880]
Great!
Yes, we can call the column what we want. Any suggestions?
Looks like I should also move the new columns to the right of the column that says: "The following fields are not in the 360 Giving Standard and are added by GrantNav.".
How about 'award date formatted'
Also, I am afraid that moving it to the additional fields people will miss it. In a perfect world, I would just replace the current award date with the award date formatted since people said it the current date time format is not useful for them.
Thoughts?
On Tue, 29 Jan 2019 at 03:21, Ben Webb notifications@github.com wrote:
Great!
Yes, we can call the column what we want. Any suggestions?
Looks like I should also move the new columns to the right of the column that says: "The following fields are not in the 360 Giving Standard and are added by GrantNav.".
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/OpenDataServices/grantnav/issues/466#issuecomment-458392636, or mute the thread https://github.com/notifications/unsubscribe-auth/ADHzu5s2Hl8Q1FA9bLgvX9uKol_plYyIks5vH74vgaJpZM4VBCb3 .
-- Mor Rubinstein Data Labs Manager 360Giving http://www.threesixtygiving.org/ Twitter - @morchickit https://twitter.com/Morchickit Cell: +44(0)7904191881 / Office: 020 3752 5775 Skype: mor.rubinstein 360Giving is a company https://beta.companieshouse.gov.uk/company/09668396 limited by guarantee and a registered charity http://beta.charitycommission.gov.uk/charity-details/?regid=1164883&subid=0 . Read our privacy notice http://www.threesixtygiving.org/privacy/ to find out how we collect and use personal data.
[image: cid:image001.png@01D122F0.5ED3A880]
@morchickit How about "Award Date ISO8601"? That then describes the field from the standard, and the format to expect.
Also, I am afraid that moving it to the additional fields people will miss it.
We could add it to the help page, which is now linked from right next to the download buttons? Alternatively, once we get the CSV metadata stuff worked out, we could use some sort of marker (perhaps a *) to designate fields that aren't in the standard, and document that in the file, which will let us put the columns in any order that we want. If we put the new field next to the existing one, then we're effectively saying that it's part of the standard - which is wrong, and we don't want publishers to think that because it's that way in GN, they have to do the same.
In a perfect world, I would just replace the current award date with the award date formatted since people said it the current date time format is not useful for them.
As and when we're able to mix non-standard fields in with standard fields in a more curated output, then we should definitely do that! In the meantime, that's fundamentally misrepresenting what publishers have published. The standard allows a date-time for Award Date, and a publisher may well publish data that uses the time component meaningfully (eg, before or after a certain event). Such a publisher would then expect their data to be faithfully reproduced by GrantNav.
Ok, while" Award Date ISO8601" describes the format, I suspect our users who are not technically savvy and would not understand what it means. I prefer something more relatable for them.
I would also like to challenge this notion that this is not a field in the schema. All we done here is to trim off the time from strings that were date-time format and made it to YYYY-MM-DD (which is still valid in the schema). In the standard itself, we basically made something that was easy to machine to work with, because of the output of their CRM to do rather than because they gave a grant at 00:00:00 GMT (which is probably impossible and the time stamp the majority of our grants have). I believe, and Katherine can confirm it, that those who publish in this format are doing it because it's convenient rather than meaningful. It would be meaningful in real-time data, but grants data is far from it.
I think we are over complicating this for the sake of "raw data" (speaking of that, read this - https://www.thenewatlantis.com/publications/why-data-is-never-raw) rather than our users. Unlike other GrantNav fields that we added, this is literally just triming characters and not adding data from another datasets.
On Mon, 4 Feb 2019 at 09:15, Rob Redpath notifications@github.com wrote:
@morchickit https://github.com/morchickit How about "Award Date ISO8601"? That then describes the field from the standard, and the format to expect.
Also, I am afraid that moving it to the additional fields people will miss it.
We could add it to the help page, which is now linked from right next to the download buttons? Alternatively, once we get the CSV metadata stuff worked out, we could use some sort of marker (perhaps a *) to designate fields that aren't in the standard, and document that in the file, which will let us put the columns in any order that we want. If we put the new field next to the existing one, then we're effectively saying that it's part of the standard - which is wrong, and we don't want publishers to think that because it's that way in GN, they have to do the same.
In a perfect world, I would just replace the current award date with the award date formatted since people said it the current date time format is not useful for them.
As and when we're able to mix non-standard fields in with standard fields in a more curated output, then we should definitely do that! In the meantime, that's fundamentally misrepresenting what publishers have published. The standard allows a date-time for Award Date, and a publisher may well publish data that uses the time component meaningfully (eg, before or after a certain event). Such a publisher would then expect their data to be faithfully reproduced by GrantNav.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/OpenDataServices/grantnav/issues/466#issuecomment-460176812, or mute the thread https://github.com/notifications/unsubscribe-auth/ADHzu1vTD0SYwY9ev1KvqJwWf-Th0iCgks5vJ_odgaJpZM4VBCb3 .
-- Mor Rubinstein Data Labs Manager 360Giving http://www.threesixtygiving.org/ Twitter - @morchickit https://twitter.com/Morchickit Cell: +44(0)7904191881 / Office: 020 3752 5775 Skype: mor.rubinstein 360Giving is a company https://beta.companieshouse.gov.uk/company/09668396 limited by guarantee and a registered charity http://beta.charitycommission.gov.uk/charity-details/?regid=1164883&subid=0 . Read our privacy notice http://www.threesixtygiving.org/privacy/ to find out how we collect and use personal data.
[image: cid:image001.png@01D122F0.5ED3A880]
Before the clarification of dates last year I wasn't aware that date-time was allowed for Award Date and so none of the publishers I worked with where presented with date-time as an option. I have not worked with any publishers who are intentionally using date-time / where time of day is meaningful for describing their work.
I think usability of the data is key for the CSV download. For me that means the date formatted dates column is clearly visible and labeled in a way that general users can understand.
GrantNav being a de facto api for the dataset seems to be in tension with GN being a tool that allows a non-tech audience to explore and start using 360 data.
GrantNav being a de facto api for the dataset seems to be in tension with GN being a tool that allows a non-tech audience to explore and start using 360 data.
That's a large part of what's going on here. The other dimension is that GrantNav is designed to be a mirror to the data (to fuel questions such as "why don't my grants appear in Location searches"), and to the standard (to fuel questions such as "why does the standard allow you to express dates as date-times when that makes the data harder to use, and no-one uses the time component meaningfully?"). Those questions make us go through the same experience as anyone else who wants to use the data, except that we're dealing with the pain so that others don't have to.
All of these uses - both the ones that we originally designed GrantNav for, and the one that it's gained over the years, are valuable.
I can think of three potential ways forward:
This:
We update the standard to make it explicit that the time component of dates is meaningless and can be dropped by consuming applications. This wouldn't change the schema, just the standard docs (docs + schema = standard). Once the standard permits consuming applications to drop the time component, we can then update GrantNav to do exactly that in the CSV download, and add an appropriate explanatory note to the CSV download. This would be a MINOR change, which would require community consultation and a bump to 1.1
We decided in the Stewardship committee we can go on with this. So I will update the documentation to represent it and we can then solve this issue.
@michaelwood - just to clarify with that related commit (a20a4f3) that we would want the date fields to be overwritten in the CSV rather than adding new fields - the decision of the stewardship committee means that we can do that I think it would be overcomplicated to add additional fields with the same info in.
A modified version of this has been deployed to the dev grantnav http://latest.es7.grantnav-dev.default.threesixtygiving.uk0.bigv.io/ modifications are:
Thanks @michaelwood - can you link me to the relevant commit(s)?
Skip the additionally created fields in the grant view
I'm unsure as to why there are additionally created fields; I'll comment on this properly once I've seen the code
Always fallback to the raw date if we can't parse it
I don't think this should be possible. If the date isn't ISO 8601 in the input data, then it'll fail validation, and therefore won't get as far as GrantNav in the first place, right?
Thanks @michaelwood - can you link me to the relevant commit(s)?
https://github.com/ThreeSixtyGiving/grantnav/commits/master-next-major-version 4eca5c122680af45e3a5af28b9fd40416ebcde61...7061616dcb3344d8f3925845d47174ad73a6dd5a
(That branch is a holding branch for a bunch of things to be reviewed and is what dev grantnav runs)
Skip the additionally created fields in the grant view
I'm unsure as to why there are additionally created fields; I'll comment on this properly once I've seen the code
The additional fields are the date only fields (rather than ISO) on the grant document which only contain the date, these are created at import time and then when a CSV is requested the CSV generator just has a list of fields it uses.
To avoid our Date only fields appearing when you view a grant and thus two sets of dates (original, ours), I have added it to the SKIP_FIELDS filter which is used on the grant object/doc.
Always fallback to the raw date if we can't parse it
I don't think this should be possible. If the date isn't ISO 8601 in the input data, then it'll fail validation, and therefore won't get as far as GrantNav in the first place, right?
Yeah I thought that, but then I found there were cases where the field could be blank and wasn't sure how our validation matched up with python's date parser. As this is an 'added bonus' field I figured that there should be no cases where generating this field should cause the grant to be discarded, so the behaviour is now that it always falls back to the original value in the grant.
This work is now released and is deployed to live.
We had a workshop with grantmakers today (our Digging the Data one), and they all had issues with transforming the dates to be consistent in the GrantNav CSV download (see more on dates type in GrantNav in #395).
To make it easier for data users, can we make all dates in GrantNav download CSV file to be formatted like 2017-02-09, but not one with extra time info (like 2017-02-09T00:00:00.000Z).
@Bjwebb and @robredpath - How much work is it?