Open johanneswilm opened 4 years ago
Does that mean that it can be used as a drop in replacement and that it covers all the same features?
Probably not, the Syntax column is a big simplification. The whole chart is meant as a way to compare different parser to replace the current
one, and so is only compared on features the current one had or that I wanted for the new one. A number of differences, in terms of features, in idea-reworked
, compared to biblatex-csl-converter
:
So it definitely isn't a drop in replacement, as the API is quite different, and depending on your needs it may not be possible at all to switch.
Ok, I understand. So "complete" doesn't mean "feature complete" but rather "completely covers what the other parser did"? Maybe that could be added somewhere as else it looks a bit misleading and users that may be better off using one the other parsers are lead to believe that they shouldn't. I'd prefer not to have to set up a different chart making counter claims, etc. . Speed isn't much of a concern for Fidus Writer's usecase of biblatex-csl-converter
as it's totally fine to wait 250 ms for a single citation to be converted and even up to several minutes if a user uploads their entire mega collection as processing will happen entirely on that user's machine.
Accuracy is more important and also keeping maintenance costs down. So if there is another parser that can do the exact same but is maintained by someone else, I'd like to shut down biblatex-csl-converter
. And if there isn't one, then I'd like for everyone else out there who needs the same functionality to contribute to biblatex-csl-converter
so that we don't have to do all the maintenance by ourselves. That's why it would be nice to make sure people aren't mislead by that chart somehow.
And yes, please once you think your parser or one of the other ones covers all the features, let me know and I can see whether it still makes sense to put an AST converter on top and drop biblatex-csl-converter
altogether.
"complete" means nothing more and nothing less than that it parses syntax.bib
accurately, which encompasses all the syntax I had in mind for the new parser (apart from syntax within values).
Maybe that could be added somewhere as else it looks a bit misleading and users that may be better off using one the other parsers are lead to believe that they shouldn't. I'd prefer not to have to set up a different chart making counter claims, etc.
That's fair, I just didn't really intend this repository for other users to make choices with. What's missing from the description is "the new BibTeX parser formula for Citation.js". And the comparisons where either because I wanted to see if my new parser was up to the task, or because someone asked me to add it to the comparison. But I definitely see where you're coming from, and you're not the only one, so I'll change it up and also add more detailed comparisons.
I can see whether it still makes sense to put an AST converter on top
I'm not really sure what you mean by this. How is an AST converter "on top", and if you'd be dropping biblatex-csl-converter
where woud it be on top of?
But I definitely see where you're coming from, and you're not the only one, so I'll change it up and also add more detailed comparisons.
Thank you very much for that. And yes, just a little bit of wording so that others understand what the purpose of the chart is and that it's not a full feature comparison of everything is all that I'm asking for. The comparison is still quite interesting.
I'm not really sure what you mean by this.
Sorry, let me reword. Currently biblatex-csl-converter
outputs exactly the javascript object format we use internally in Fidus Writer. So if we switch to something else, then we'll probably need that parser + a converter from the output of that parser to the format we use internally in Fidus Writer. So there would be a bit of development cost creating this converter. That's all I was trying to say.
I don't mean to pile on just to be antagonistic, but idea-reworked
parses syntax.bib
(which is invalid BTW -- biblatex chokes on it) into
[
{
type: 'book',
label: 'sweig42',
properties: {
author: "Stefan Swe{\\i}g and Xavier D\\'ecoret",
title: ' The {impossible} ℡—book ',
publisher: ' D\\"ead Poₑeet Society',
year: 1942,
month: '03'
}
}
]
I don't know if I'm calling it wrong:
const parser = require('./lib/idea-reworked')
const fs = require('fs')
console.log(parser.parse(fs.readFileSync('test/files/syntax.bib', 'utf-8')))
but it doesn't seem to do diacritics replacement, anything with braces, and for the subscript interpretation it just picks up the first character. Also, biblatex ignores leading and trailing spaces so title and publisher should have been trimmed. And TEL
is superscript?
Wait, I got that wrong -- syntax.bib
has double backslashes in the text, so it's not supposed to do diacritics conversions as there are none. Anyhow, that still leaves braces, subscript and superscript, and trimming.
which is invalid BTW --
biblatex
chokes on it
natbib
should not, at least the last time I checked.
<sup>
and <sub>
markup.TEL
gets converted to the corresponding Unicode character in Zotero, which is were I got a lot of stuff from in the first version, and I kept it that way. which is invalid BTW --
biblatex
chokes on it
natbib
should not, at least the last time I checked.
Fair enough, it does.
* For superscript and subscript, I implemented it like that specifically but I don't know why. I'm converting them to Unicode characters which has limited support,
But that doesn't apply here -- a unicode subscript e
does (clearly) exist, the parser just doesn't convert the other two e
s.
but I think CSL supports
<sup>
and<sub>
markup.
It does. My parser converts to unicode sub/superscript where possible and uses <sup>
and <sub>
where that's not possible.
* `TEL` gets converted to the corresponding Unicode character in Zotero, which is were I got a lot of stuff from in the first version, and I kept it that way.
I don't really follow -- in syntax.bib
I see TEL as \u54\u45\u4C
, after conversion it show up as \u2121
. The TEL
in the input isn't a single character, it's a word, and title casing by a CSL style is going to affect it differently.
• I found just transforming the first character (if it's supported) more consistent than to create a string with part sub/superscript and part normal text • Regarding TEL: that's the point (well, not the title casing) https://github.com/zotero/translators/blob/bae2057067e2fde076252a3b897a7e689a173c71/BibTeX.js#L1707
• I found just transforming the first character (if it's supported) more consistent than to create a string with part sub/superscript and part normal text
$_{eee}$
should become either ₑₑₑ
or <sub>eee</sub>
, not ₑee
. The braces mean that the entire string is subscript.
• Regarding TEL: that's the point (well, not the title casing) https://github.com/zotero/translators/blob/bae2057067e2fde076252a3b897a7e689a173c71/BibTeX.js#L1707
That table is a lossy mapping from unicode to ASCII TeX, you can't always revert this table for TeX to unicode mapping -- TEL
being one such instance that should not be reversed. If the unicode char maps to a string that does not contain TeX-reserved characters, you generally do not want to use it as a reverse mapping.
That table is a lossy mapping from unicode to ASCII TeX, you can't always revert this table for TeX to unicode mapping
Case in point: the reverse table is held separately here, and I would argue that
the reverse mapping of {TEL}
is a poor choice -- {TEL}
means "the phrase TEL, not to be messed with in sentence casing". It does not mean "Telephone Sign" (which is the name of \u2121
in the unicode table).
Interesting conversation you guys are having here.
but I think CSL supports
<sup>
and<sub>
markup.
Does that mean this parser does not support the other html tags either? biblatex-csl-exporter currently supports these in CSL export:
const TAGS = {
'strong': {open:'<b>', close: '</b>'},
'em': {open:'<i>', close: '</i>'},
'sub': {open:'<sub>', close: '</sub>'},
'sup': {open:'<sup>', close: '</sup>'},
'smallcaps': {open:'<span style="font-variant:small-caps;">', close: '</span>'},
'nocase': {open:'<span class="nocase">', close: '</span>'},
'enquote': {open:'“', close: '”'},
'url': {open:'', close: ''},
'undefined': {open:'[', close: ']'}
}
citeproc supports these; enquote
and later in your table isn't markup so CSL won't mind. I can't find what CSL formally support, but everything that uses citeproc in its various incarnations will support the markup listed under that link.
enquote
and later in your table isn't markup so CSL
Right, because as far as I know, citeproc-js doesn't have any corresponding tag for these. All the other ones are in that list you are linking to.
Correct.
@retorquere Ah, now I understand your reply. My first comment on this here was not formulated very well. I updated it now. I wasn't asking whether citeproc supports it (I know it does), I was wondering about this parser.
Does that mean this parser does not support the other html tags either?
It does, but not all the commands it seems (code):
const richTextMappings = {
textit: 'i',
textbf: 'b',
textsc: 'sc',
textsuperscript: 'sup',
textsubscript: 'sub'
}
That misses at least mkbibbold
, bf
and bfseries
for bold, sl
, em
, it
, itshape
, mkbibitalic
, mkbibemph
, emph
for italics, sc
and scshape
for smallcaps, and citeproc doesn't support <sc>
, just <span style="font-variant: small-caps;">
Parsing stuff like {partially \bf bold} but not this
is interesting (in the apocryphal Chinese sense) in that \bf
affects everything after it until the end of the current block, so here, only the word bold
should be bold. That sample is synthetic, just for illustration; in practice you'd see the much more sensible partially {\bf bold} but not this
but here the interesting aspect is that here the braces do not mean nocase
. If a block has a command at the start, it is ignored for case protection by bib(la)tex.
Okay, that's some more things to add to the list. This does make me lean towards moving more parts of the parsing to earlier in the process.
{partially \bf bold} but not this
, are the braces still a nocase
, since the \bf
is not at the start of the block?<sc>
was mentioned in (although not part of) the old specification, I think that is were I got it. It seems to still be included in some test casesOkay, that's some more things to add to the list. This does make me lean towards moving more parts of the parsing to earlier in the process.
I don't see any other way this can be done. In a one-pass parser, it must be done during the parse, since you need the context to make these decision. In a two-pass parser like mine, the decision can be postponed until the 2nd pass.
For
{partially \bf bold} but not this
, are the braces still anocase
, since the\bf
is not at the start of the block?
Yes:
\documentclass{article}
\usepackage[american]{babel}
\usepackage[backend=biber, style=apa]{biblatex}
\DeclareLanguageMapping{american}{american-apa}
\usepackage{filecontents}
\begin{filecontents}{\jobname.bib}
@article{03, author = "03",
title = "{\bf Next: Bold}",
}
@article{04, author = "04",
title = "{Next: \bf Bold}",
}
@article{05, author = "05",
title = "{Next: Bold}",
}
\end{filecontents}
\addbibresource{\jobname.bib}
\begin{document}
\nocite{*}
\printbibliography
\end{document}
gives
<sc>
was mentioned in (although not part of) the old specification, I think that is were I got it. It seems to still be included in some test cases
I think most will actually still support it, but it's out of spec (even if I think it looks better)
I haven't used B-C-C in a while, but it always used to be noticeably faster than the BBT parser. I don't know why the latest tests don't bear this out.
Hey, I just discovered this chart. I have been participating in the maintenance of
biblatex-csl-converter
over the past few years. Based on your chart it looks likeIdea (reworked)
gives the same output quality asbiblatex-csl-converter
. Does that mean that it can be used as a drop in replacement and that it covers all the same features? If that is the case, is there any reason why I would continue to maintainbiblatex-csl-converter
?