Open larsgw opened 4 years ago
That's more a difference whether a command parses its argument in verbatim-mode; \url
expects one parameter, and parses that in verbatim mode; \href
expects two arguments, but parses the first verbatim, and the 2nd normal. \begin{verbatim} ...\end{verbatim}
parses everything in that environment verbatim. \verb
parses everything until the end of the block it's in verbatim.
There's simply no math in verbatim environments, because the $
is just a character there.
That's a bit annoying, I was planning to do something like the following:
// constants.js
export const argumentCommands = {
href (url, text) { return text === url ? text : `${text} (${url})` }
}
// value.js (grammar)
const grammar = new Grammar({
// ...
Command () {
const command = this.consumeToken('command').value
if (command in constants.argumentCommands) {
const func = constants.argumentCommands[command]
const args = []
let arity = func.length // fun thing
while (arity-- > 0) {
this.consumeToken('whitespace', /* optional: */ true)
args.push(this.consumeRule('Argument'))
}
return func(...args)
} // else...
},
// ...
})
If you retain the full parsed input attached to the tokens while tokenizing, it's possible to decide during this phase how you want to handle the input. Basically, you process the tokens according to their semantic meaning for normal mode, and for verbatim mode, you take the parsed orig text attached to the tokens and string it together.
Don't forget that commands can have arguments in square brackets. I simply ignore them, but for that I do have to parse them.
I think I might just let the command functions be called as if they're rules in the grammar, i.e. they can decide themselves how to parse their arguments. Perhaps a bit similar to what you're doing, based on what I saw. It feels a bit weird to make it that customisable but I don't think it can lead to code injection or the like.
By the way, I am working on a prototype plugin for @citation-js/plugin-bibtex
that extends unicode support with your unicode2latex
tables. I don't really want to put an additional 400KB in the default browser bundle so I think an optional plugin to the plugin could work well. I am still working out how to add things like {\\'{}I}
but that might be helped by the changes mentioned above.
From my pov you're making astounding progress.
The
citationjs
parser needs to allow for more different kinds of commands, mostly argument commands. Arguments seem to be treated the same always: it either takes in a braced block or the first character of text. Exceptions are math blocks:\url
takes in the dollar sign verbatim while\emph
does not.