Closed retorquere closed 7 years ago
Also, replacing that terminator with a {}
:
@InProceedings{test_citation2,
Title = {{T}est {T}itle with space\quad{}quad, space\;semi-colon, space\:colon, space\,comma, space~nbsp},
}
everything after the {}
is marked nocase
I think the second issue should be addressed by the commit.
As for the first issue: This is interesting. Should we attempt to first replace all command names followed by a space and then without a space? It seems like we still need to be able to deal with "\quadHello" by replacing the "\quad", right?
Nope, \quadHello
is simply an unknown command; most LaTeX interpreters I've seen simply complain and fail, although I think you can tell them to ignore errors. But none will recognize \quad
in this. I think the general rule stated informally is "a command is terminated by the first non-alphanum character. If that character is a space (but only a simple space, not ~
or a newline for example), eat it". So
\textcopyright\textcopyright
\textcopyright \textcopyright
produces ©©
(no space) in both cases
and \textcopyright \textcopyrightx
will produce an error.
I thought we made it recognize the command anyway in the \quadHello
case due to your test cases last year. But my memory is not so clear about this.
In which case, my bad. This should have been ignored. My own parser spits out unknown commands as plain text (so \quadHello
would yield quadHello
) -- not great, but just as a safety fallback. Ignoring it would have been equally valid.
ok, let's try this. Just to be clear:
"P\"ackchen" still turns into "Päckchen" even though there is no space or other word boundary after the "a". Should we maybe first take all those cases where replacement takes place even without a word boundary and then the ones that do require a word boundary?
Non-alphanum commands are special cases - I think they're always one char long and apply to the entity (single characters or braced block) right after them. @njbart is my go to guy for the finer points - I'll go check, but what you propose sounds good to me.
I can see we have for example these: \'{}{I}
and \'{}O
that are longer than one character.
Those render to ́I and ́O
, which is what I'd expect; a floating backtick over nothing, followed by I
/O
, The \'
is special(er) in that it doesn't advance the rendering position (it's a compositing mark); it's commonly used in combination with \'\i
to produce an accented i
. For our purposes it is best seen as a combined \'
with a following character; I don't know a sensible unicode counterpart to \'{}{I}
except perhaps outputting a non-composited compositing mark.
Currently, when this is parsed
@InProceedings{test_citation2,
Title = {{T}est {T}itle with space\quad quad, space\;semi-colon, space\:colon, space\,comma, space~nbsp}}
}
the space after \quad
is not consumed (I think!). Is this a regression? I though the parser handles this now?
Yes, this worked a few days ago.
The space is still consumed here
No idea what happened here, but removing my node_modules dir and installing all packages afresh just resolved the matter.
And my tests ran green on circle, it was just locally that one failed. Odd, but hey, green! That concludes my import port!
In case of
I think the space between
space\quad
andquad
should have been consumed (LaTeX interprets a space after a command not as a space but as an otherwise inert command terminator).