fiduswriter / biblatex-csl-converter

A set of JavaScript converters: bib(la)tex => json, json => csl, and json => biblatex
GNU Lesser General Public License v3.0
34 stars 10 forks source link

Subscript not recognized #66

Closed retorquere closed 7 years ago

retorquere commented 7 years ago

This one may be tricky:

@Article{Isobe_1996,
  Title                    = {$T_\mathrm{c}$},
}

It's safe enough to ignore the \mathrm, but the _ applies to its argument, which in this case is \mathrm{c}, so I'd have hoped to see T<sub>c</sub>.

Also, much to my surprise when I tried, $T$ looks visually much more like <i>t</i> than it does like T. I tried $Tt$ and I can't tell them apart.

retorquere commented 7 years ago

Ha, wow. $T\mathrm{c}$ comes out looking like <i>t</i>c.

retorquere commented 7 years ago

Don't know if we would want to take it all the way there though.

johanneswilm commented 7 years ago

We will not be able to fully recreate math mode. That user simply needs to put a small t outside of math mode.

As for the first problem. Currently we ask it to ignore the underscore if there is a command following it.

What we could do is replace "\mathrm" and "\text" with "". That should put the underscore before the brackets, and that would make everything within the brackets get the underscore. Would that work?

retorquere commented 7 years ago

mathmode becomes interesting should mathml ever make it into csl. Other than that it would be rather insane to try.

Yeah, removing those should do the job if they can be removed cleanly; this is a scanning parser without backtracking, right? so if the input at the cursor is \mathrm($|\s|[^a-zA-Z]) or somesuch it should be removable cleanly?

johanneswilm commented 7 years ago

It would just remove the command, not the argument. So $_\mathrm{hello}$ becomes $_{hello}$ which then parses cleanly. See the commit. Does that look like what you would expect?

retorquere commented 7 years ago

Yep, looks great. And for fun, {{Cu$_2$O(1\,1\,1)-Cu$CU_\mathrm{CUS}$}} doesn't make the CU outside the mathrm look like lowercase.

johanneswilm commented 7 years ago

I noticed that there were some "nocase" marks added with this change. Is that correct?

retorquere commented 7 years ago

They do no harm here, and are probably semantically closer to the mark than not having them, because math mode (which is where you will find \mathrm) doesn't apply case protection... but

Title = {Some text and $\mathrm{SOME TEXT WITH SPACES}$}

renders to

“Some text andSOMETEXTWITHSPACES.”

Nice. Although honestly, LaTeX interpretation is hard enough -- math mode interpretation, other than where used to produce subscript/superscript and another few niceties, is probably not a sane scope.

johanneswilm commented 7 years ago

Ok, I think we are done with this issue then.