fiduswriter / biblatex-csl-converter

A set of JavaScript converters: bib(la)tex => json, json => csl, and json => biblatex
GNU Lesser General Public License v3.0
34 stars 10 forks source link

Consuming the space after a command #53

Closed retorquere closed 7 years ago

retorquere commented 7 years ago

In case of

@InProceedings{test_citation2,
  Title                    = {{T}est {T}itle with space\quad quad, space\;semi-colon, space\:colon, space\,comma, space~nbsp},
}

I think the space between space\quad and quad should have been consumed (LaTeX interprets a space after a command not as a space but as an otherwise inert command terminator).

retorquere commented 7 years ago

Also, replacing that terminator with a {}:

@InProceedings{test_citation2,
  Title                    = {{T}est {T}itle with space\quad{}quad, space\;semi-colon, space\:colon, space\,comma, space~nbsp},
}

everything after the {} is marked nocase

johanneswilm commented 7 years ago

I think the second issue should be addressed by the commit.

As for the first issue: This is interesting. Should we attempt to first replace all command names followed by a space and then without a space? It seems like we still need to be able to deal with "\quadHello" by replacing the "\quad", right?

retorquere commented 7 years ago

Nope, \quadHello is simply an unknown command; most LaTeX interpreters I've seen simply complain and fail, although I think you can tell them to ignore errors. But none will recognize \quad in this. I think the general rule stated informally is "a command is terminated by the first non-alphanum character. If that character is a space (but only a simple space, not ~ or a newline for example), eat it". So

\textcopyright\textcopyright

\textcopyright \textcopyright

produces ©© (no space) in both cases

retorquere commented 7 years ago

and \textcopyright \textcopyrightx will produce an error.

johanneswilm commented 7 years ago

I thought we made it recognize the command anyway in the \quadHello case due to your test cases last year. But my memory is not so clear about this.

retorquere commented 7 years ago

In which case, my bad. This should have been ignored. My own parser spits out unknown commands as plain text (so \quadHello would yield quadHello) -- not great, but just as a safety fallback. Ignoring it would have been equally valid.

johanneswilm commented 7 years ago

ok, let's try this. Just to be clear:

"P\"ackchen" still turns into "Päckchen" even though there is no space or other word boundary after the "a". Should we maybe first take all those cases where replacement takes place even without a word boundary and then the ones that do require a word boundary?

retorquere commented 7 years ago

Non-alphanum commands are special cases - I think they're always one char long and apply to the entity (single characters or braced block) right after them. @njbart is my go to guy for the finer points - I'll go check, but what you propose sounds good to me.

johanneswilm commented 7 years ago

I can see we have for example these: \'{}{I} and \'{}O that are longer than one character.

retorquere commented 7 years ago

Those render to ́I and ́O, which is what I'd expect; a floating backtick over nothing, followed by I/O, The \' is special(er) in that it doesn't advance the rendering position (it's a compositing mark); it's commonly used in combination with \'\i to produce an accented i. For our purposes it is best seen as a combined \' with a following character; I don't know a sensible unicode counterpart to \'{}{I} except perhaps outputting a non-composited compositing mark.

retorquere commented 7 years ago

Currently, when this is parsed

@InProceedings{test_citation2,
  Title                    = {{T}est {T}itle with space\quad quad, space\;semi-colon, space\:colon, space\,comma, space~nbsp}}
}

the space after \quad is not consumed (I think!). Is this a regression? I though the parser handles this now?

johanneswilm commented 7 years ago

Yes, this worked a few days ago.

johanneswilm commented 7 years ago

The space is still consumed here

retorquere commented 7 years ago

No idea what happened here, but removing my node_modules dir and installing all packages afresh just resolved the matter.

retorquere commented 7 years ago

And my tests ran green on circle, it was just locally that one failed. Odd, but hey, green! That concludes my import port!