Closed retorquere closed 7 years ago
Are you suggesting to use the edtf.js library to verify the date, and if acceptable, save the string and parse on to csl as a raw date?
Is this possibly an area where it would make sense to use your existing dateparser?
Oh boy, I see there are at least 6 years of online discussions about whether or not csl should adopt edtf.js, and it looks like only the ruby version has done so, so far. I guess this will take another 6 years to be resolved.
Is there some kind of automatic translation between the two?
Maybe this will work: Instead of using a json structure, store the date in edtf format. Verify input for whether it is edtf compliant, throw an error when it's not and leave it blank (exception: year field). For export: for biblatex forward the edtf date string directly. For csl: Assume Frank will incorporate the edtf library in the near future[1] and hope/pray that citeproc likely will accept 99% of edtf-based strings already as "raw" input.
[1] https://github.com/Juris-M/citeproc-js/issues/6#issuecomment-249304810
It's not just CSL to worry about. On the import side, biblatex supports edtf in the date
field (perhaps only edtf? I can't remember). edtf.js will do both validation (with a few tweaks for biblatex) and parsing in one stroke.
It'd be nice if citeproc would accept edtf, but I currently just produce the CSL dateparts. No automatic conversion between them, but the CSL date format is pretty simple.
(my existing date parser in part leans on the CSL date parser, so you'd be drawing that in. I don't currently use edtf.js because I can't use it until after the 5.0 port)
It's not just CSL to worry about.
If biblatex import and export follow edtf, and we also use an edtf string as the intermediate format, what else is there to worry about than csl? And apparently citeproc will also switch to edtf.js.
Right now we use the csl date parts as the intermediate format. But it's probably better to switch to an edtf string for the above reason.
EDTF would do just fine for me. I thought you wanted parsed dates.
I thought you wanted parsed dates.
This can be moved to the CSL export filter. I just wanted to validate the date fields and present them in a format that can be read by CSL. Now given that all the involved parts seem to switch to edtf, all we seem to need is validation on input and conversion for csl on export, and even that may go away soon.
From the biblatex manual:
Date fields such as the default data model dates date, origdate, eventdate, and urldate adhere to edtf13 (Extended Date/Time Format) specification levels 0 and 1.
So that sounds like only EDTF is allowed.
That's how I read it.
EDTF.js parses the working draft of the spec BTW, not the current EDTF format (https://github.com/inukshuk/edtf.js/issues/6), but I've managed to parse everything I could find so far by doing
EDTF.parse(date.replace(/^y/, 'Y').replace(/unknown/g, '*').replace(/open/g, '').replace(/u/g, 'X').replace(/\?~/g, '%'))
Of course — two different and slightly incompatible formats, because having just one would be too easy. And given that biblatex supports the old version, that will stay with us for the next 20 years.
Are we 100% sure that biblatex supports the current spec? And do you also have a series of replacements to move from the old to the new format?
I am not 100% sure, but plk seems to be (https://github.com/plk/biblatex/issues/505#issuecomment-256957867), and he should know.
What do you mean with a series of replacements?
Dates are frigging hard. Not name-hard, but still plenty hard. Unless you take into account what BBT needs to parse (stuff like 8 juin 2016), then it inches towards name-hard.
Your replace-chain converts from the old to the new format, right? Do you have one that goes the other way?
That's old to new, correct. I don't have a new to old chain, but it's just pretty much done by swapping out the arguments -- except for open
. But I think ^\/
would translate to open/
, and \/$
to /open
. All the other replacements are uniquely identified I think.
Here's my current test set for EDTF dates:
@article{barker1_2016_turkey,
abstract = {Religious fundamentalism is a powerful force in Turkey where the military has a long history of intervening in politics to ensure the nation remains secular.},
author = {Barker1, Anne},
date = {2016-07-18T20:26:06},
entrysubtype = {newspaper},
journaltitle = {ABC News},
langid = {australian},
note = {Actual: 2016-07-18T20:26:06+10:00},
rights = {http://www.abc.net.au/conditions.htm\#UseOfContent},
timestamp = {2015-02-24 12:14:36 +0100},
title = {Turkey Divided between Secular and {{Islamist}} Rule},
url = {http://www.abc.net.au/news/2016-07-18/turkey-coup-attempt-shows-division-over-wish-for-islamist-rule/7639292},
urldate = {2016-07-24}
}
@article{barker2_2016_turkey,
abstract = {Religious fundamentalism is a powerful force in Turkey where the military has a long history of intervening in politics to ensure the nation remains secular.},
author = {Barker2, Anne},
date = {2016-07-18T20:26:06+10:00},
entrysubtype = {newspaper},
journaltitle = {ABC News},
langid = {australian},
note = {Actual: 2016-07-18T20:26:06+10:00},
rights = {http://www.abc.net.au/conditions.htm\#UseOfContent},
timestamp = {2015-02-24 12:14:36 +0100},
title = {Turkey Divided between Secular and {{Islamist}} Rule},
url = {http://www.abc.net.au/news/2016-07-18/turkey-coup-attempt-shows-division-over-wish-for-islamist-rule/7639292},
urldate = {2016-07-24}
}
@article{barker3_2016_turkey,
abstract = {Religious fundamentalism is a powerful force in Turkey where the military has a long history of intervening in politics to ensure the nation remains secular.},
author = {Barker3, Anne},
date = {2016-07-18T20:26:06Z},
entrysubtype = {newspaper},
journaltitle = {ABC News},
langid = {australian},
note = {Actual: 2016-07-18T20:26:06+10:00},
rights = {http://www.abc.net.au/conditions.htm\#UseOfContent},
timestamp = {2015-02-24 12:14:36 +0100},
title = {Turkey Divided between Secular and {{Islamist}} Rule},
url = {http://www.abc.net.au/news/2016-07-18/turkey-coup-attempt-shows-division-over-wish-for-islamist-rule/7639292},
urldate = {2016-07-24}
}
@article{barker4_-876_turkey,
abstract = {Religious fundamentalism is a powerful force in Turkey where the military has a long history of intervening in politics to ensure the nation remains secular.},
author = {Barker4, Anne},
date = {-0876},
entrysubtype = {newspaper},
journaltitle = {ABC News},
langid = {australian},
note = {Actual: 2016-07-18T20:26:06+10:00},
rights = {http://www.abc.net.au/conditions.htm\#UseOfContent},
timestamp = {2015-02-24 12:14:36 +0100},
title = {Turkey Divided between Secular and {{Islamist}} Rule},
url = {http://www.abc.net.au/news/2016-07-18/turkey-coup-attempt-shows-division-over-wish-for-islamist-rule/7639292},
urldate = {2016-07-24}
}
@article{barker5_1723_turkey,
abstract = {Religious fundamentalism is a powerful force in Turkey where the military has a long history of intervening in politics to ensure the nation remains secular.},
author = {Barker5, Anne},
date = {1723~},
entrysubtype = {newspaper},
journaltitle = {ABC News},
langid = {australian},
note = {Actual: 2016-07-18T20:26:06+10:00},
rights = {http://www.abc.net.au/conditions.htm\#UseOfContent},
timestamp = {2015-02-24 12:14:36 +0100},
title = {Turkey Divided between Secular and {{Islamist}} Rule},
url = {http://www.abc.net.au/news/2016-07-18/turkey-coup-attempt-shows-division-over-wish-for-islamist-rule/7639292},
urldate = {2016-07-24}
}
@article{barker6_1723_turkey,
abstract = {Religious fundamentalism is a powerful force in Turkey where the military has a long history of intervening in politics to ensure the nation remains secular.},
author = {Barker6, Anne},
date = {1723?},
entrysubtype = {newspaper},
journaltitle = {ABC News},
langid = {australian},
note = {Actual: 2016-07-18T20:26:06+10:00},
rights = {http://www.abc.net.au/conditions.htm\#UseOfContent},
timestamp = {2015-02-24 12:14:36 +0100},
title = {Turkey Divided between Secular and {{Islamist}} Rule},
url = {http://www.abc.net.au/news/2016-07-18/turkey-coup-attempt-shows-division-over-wish-for-islamist-rule/7639292},
urldate = {2016-07-24}
}
@article{barker7_1723_turkey,
abstract = {Religious fundamentalism is a powerful force in Turkey where the military has a long history of intervening in politics to ensure the nation remains secular.},
author = {Barker7, Anne},
date = {1723?~},
entrysubtype = {newspaper},
journaltitle = {ABC News},
langid = {australian},
note = {Actual: 2016-07-18T20:26:06+10:00},
rights = {http://www.abc.net.au/conditions.htm\#UseOfContent},
timestamp = {2015-02-24 12:14:36 +0100},
title = {Turkey Divided between Secular and {{Islamist}} Rule},
url = {http://www.abc.net.au/news/2016-07-18/turkey-coup-attempt-shows-division-over-wish-for-islamist-rule/7639292},
urldate = {2016-07-24}
}
@article{barker8_1988_turkey,
abstract = {Religious fundamentalism is a powerful force in Turkey where the military has a long history of intervening in politics to ensure the nation remains secular.},
author = {Barker8, Anne},
date = {1988/1992},
entrysubtype = {newspaper},
journaltitle = {ABC News},
langid = {australian},
note = {Actual: 2016-07-18T20:26:06+10:00},
rights = {http://www.abc.net.au/conditions.htm\#UseOfContent},
timestamp = {2015-02-24 12:14:36 +0100},
title = {Turkey Divided between Secular and {{Islamist}} Rule},
url = {http://www.abc.net.au/news/2016-07-18/turkey-coup-attempt-shows-division-over-wish-for-islamist-rule/7639292},
urldate = {2016-07-24}
}
@article{barker9_2004_turkey,
abstract = {Religious fundamentalism is a powerful force in Turkey where the military has a long history of intervening in politics to ensure the nation remains secular.},
author = {Barker9, Anne},
date = {2004-22},
entrysubtype = {newspaper},
journaltitle = {ABC News},
langid = {australian},
note = {Actual: 2016-07-18T20:26:06+10:00},
rights = {http://www.abc.net.au/conditions.htm\#UseOfContent},
timestamp = {2015-02-24 12:14:36 +0100},
title = {Turkey Divided between Secular and {{Islamist}} Rule},
url = {http://www.abc.net.au/news/2016-07-18/turkey-coup-attempt-shows-division-over-wish-for-islamist-rule/7639292},
urldate = {2016-07-24}
}
@article{barker10_199u_turkey,
abstract = {Religious fundamentalism is a powerful force in Turkey where the military has a long history of intervening in politics to ensure the nation remains secular.},
author = {Barker10, Anne},
date = {199u},
entrysubtype = {newspaper},
journaltitle = {ABC News},
langid = {australian},
note = {Actual: 2016-07-18T20:26:06+10:00},
rights = {http://www.abc.net.au/conditions.htm\#UseOfContent},
timestamp = {2015-02-24 12:14:36 +0100},
title = {Turkey Divided between Secular and {{Islamist}} Rule},
url = {http://www.abc.net.au/news/2016-07-18/turkey-coup-attempt-shows-division-over-wish-for-islamist-rule/7639292},
urldate = {2016-07-24}
}
@article{barker11_19uu_turkey,
abstract = {Religious fundamentalism is a powerful force in Turkey where the military has a long history of intervening in politics to ensure the nation remains secular.},
author = {Barker11, Anne},
date = {19uu},
entrysubtype = {newspaper},
journaltitle = {ABC News},
langid = {australian},
note = {Actual: 2016-07-18T20:26:06+10:00},
rights = {http://www.abc.net.au/conditions.htm\#UseOfContent},
timestamp = {2015-02-24 12:14:36 +0100},
title = {Turkey Divided between Secular and {{Islamist}} Rule},
url = {http://www.abc.net.au/news/2016-07-18/turkey-coup-attempt-shows-division-over-wish-for-islamist-rule/7639292},
urldate = {2016-07-24}
}
@article{barker12_1999_turkey,
abstract = {Religious fundamentalism is a powerful force in Turkey where the military has a long history of intervening in politics to ensure the nation remains secular.},
author = {Barker12, Anne},
date = {1999-uu},
entrysubtype = {newspaper},
journaltitle = {ABC News},
langid = {australian},
note = {Actual: 2016-07-18T20:26:06+10:00},
rights = {http://www.abc.net.au/conditions.htm\#UseOfContent},
timestamp = {2015-02-24 12:14:36 +0100},
title = {Turkey Divided between Secular and {{Islamist}} Rule},
url = {http://www.abc.net.au/news/2016-07-18/turkey-coup-attempt-shows-division-over-wish-for-islamist-rule/7639292},
urldate = {2016-07-24}
}
@article{barker13_1999_turkey,
abstract = {Religious fundamentalism is a powerful force in Turkey where the military has a long history of intervening in politics to ensure the nation remains secular.},
author = {Barker13, Anne},
date = {1999-01-uu},
entrysubtype = {newspaper},
journaltitle = {ABC News},
langid = {australian},
note = {Actual: 2016-07-18T20:26:06+10:00},
rights = {http://www.abc.net.au/conditions.htm\#UseOfContent},
timestamp = {2015-02-24 12:14:36 +0100},
title = {Turkey Divided between Secular and {{Islamist}} Rule},
url = {http://www.abc.net.au/news/2016-07-18/turkey-coup-attempt-shows-division-over-wish-for-islamist-rule/7639292},
urldate = {2016-07-24}
}
@article{barker14_1999_turkey,
abstract = {Religious fundamentalism is a powerful force in Turkey where the military has a long history of intervening in politics to ensure the nation remains secular.},
author = {Barker14, Anne},
date = {1999-uu-uu},
entrysubtype = {newspaper},
journaltitle = {ABC News},
langid = {australian},
note = {Actual: 2016-07-18T20:26:06+10:00},
rights = {http://www.abc.net.au/conditions.htm\#UseOfContent},
timestamp = {2015-02-24 12:14:36 +0100},
title = {Turkey Divided between Secular and {{Islamist}} Rule},
url = {http://www.abc.net.au/news/2016-07-18/turkey-coup-attempt-shows-division-over-wish-for-islamist-rule/7639292},
urldate = {2016-07-24}
}
We also need to add a bunch of automatic integration testing via Travis before releasing a 1.0. A long list of examples like that may work well.
After reading through a few pages [1][2][3], it seems that it makes sense to restrict EDTF to level 0 and 1 for now, and for the conversion to the date value for citeproc, it really only seems to be able to take the date as either a date (year [+month] [+day]) or an interval ( year1 + [+month1] [+day1] + year2 + [+month2] [+day2]). There doesn't seem to be room for any fuziness, non-specified year, etc. .
More of a sophisticated parser seems only to be needed for the year field to attempt to make it give out a usable date. citeproc has emphasized the handling of Japanese dates, whereas from your description I gather you have added handling of written French dates. It almsot seems like it would make sense to make an extra package that does "human wording to date parsing" which could be used in other contexts as well. The purpose of the package would be to try as much as possible to get a date by interpreting a string.
[1] https://github.com/plk/biblatex/issues/427 [2] https://github.com/Juris-M/citeproc-js/blob/22e86b46576bde2c2b78896bbe00644017d02d39/src/util_dateparser.js [3] http://citeproc-js.readthedocs.io/en/latest/csl-json/markup.html#date-field-type
After reading through a few pages [1][2][3], it seems that it makes sense to restrict EDTF to level 0 and 1 for now, and for the conversion to the date value for citeproc, it really only seems to be able to take the date as either a date (year [+month] [+day]) or an interval ( year1 + [+month1] [+day1] + year2 + [+month2] [+day2]). There doesn't seem to be room for any fuziness, non-specified year, etc. .
That matches my understanding of things.
More of a sophisticated parser seems only to be needed for the year field to attempt to make it give out a usable date. citeproc has emphasized the handling of Japanese dates, whereas from your description I gather you have added handling of written French dates.
No, French was just an example; citeproc also handles these languages for me (in addition to French -- I just get that from the citeproc handling):
It almost seems like it would make sense to make an extra package that does "human wording to date parsing" which could be used in other contexts as well. The purpose of the package would be to try as much as possible to get a date by interpreting a string.
That's what citeproc date parsing does for me. Most of the other date parsing I do myself. I just use citeproc as a fallback. I could let citeproc do more of the parsing but I feel I handle some cases better myself in input I've actually be handled. One such things is that I can take locale into the process, for dates like 01-03-2016, but there are other cases such as fuzzy dates, date ranges, and origdate handling, and one edge case where the human wording algorithm failed ( think there was a period behind the month or something) where I pre-cook the date before sending it into citeproc. I currently have 200 lines of code into my date parsing, and this excludes of course the lines of code in the citeproc dateparser that I implicitly use.
But as you say: the date
field should not have all these problems, as date
is biblatex specific, and dictates its contents must be EDTF. This discussion only relates to what can be in the year
field, and I'll be happy if can just access the cooked contents of the date field. I can take it from there.
Most of the other date parsing I do myself.
And have you considered making a package of just that? It would seem like this could also be useful for say a train schedule website where a non-techie user tries to type the date in the "departure date" field.
It's I little to intertwined with the citeproc-js parser, and that in turn isn't available as a separate package. I have to load all of citeproc-js, and load language packs which make the human date parsing possible.
If you're doing your own EDTF parsing I just remembered you will have to account for months > 12, which mean season.
I incorporated the edtf.js package for that. Any reason not to do that?
Or is this season specification only available in one of citeproc/edtf?
citeproc-js parser, and that in turn isn't available as a separate package.
I think Frank was working on an npm package recently.
In general, given that both your code and the citeproc deals with parsing human date string, it would seem to make sense to combine those efforts.
EDTF.js will take care of the season thing, it's in the spec.
When that separate date parser emerges I'll be happy to add my code to that. I must at this time prioritize a stupid react site I have to build for work and the 5.0 port.
My additions don't deal with human date parsing, really, mostly with ambiguously formatted dates. For the human dates I use citeproc-js.
date = {1723~}
means "approximately 1723" and is a supported edtf date. I can highly recommend edtf.js for etdf date parsing.