fiduswriter / biblatex-csl-converter

A set of JavaScript converters: bib(la)tex => json, json => csl, and json => biblatex
GNU Lesser General Public License v3.0
34 stars 10 forks source link

Unescaped percentage signs #60

Closed retorquere closed 7 years ago

retorquere commented 7 years ago

When I have this reference:

@Article{Akhavan_2009,
Title = {Synthesis and electrochromic study of sol-gel cuprous oxide nanoparticles accumulated on silica thin film},
Author = {Akhavan, O. and Tohidi, H. and Moshfegh, A. Z.},
Journal = {Thin Solid Films},
Year = {2009},
Number = {24},
Pages = {6700--6706},
Volume = {517},

Abstract = {In this study, electrochromic properties of cuprous oxide nanoparticles, self-accumulated on the surface of
a sol-gel silica thin film, have been investigated by using UV-visible spectrophotometry in a lithium-based electrolyte
cell. The cuprous oxide nanoparticles showed a reversible electrochromic process with a thin film transmission reduction
of about 50% in a narrow wavelength range of 400-500Ânm, as compared to the bleached state of the film. Using optical
transmission measurement, we have found that the band gap energy of the films reduced from 2.7ÂeV for Cu2O to 1.3ÂeV for
CuO by increasing the annealing temperature from 220 to 300°C in an N2 environment for 1Âh. Study of the band gaps of
the as-deposited, colored and bleached states of the nanoparticles showed that the electrochromic process corresponded
to a reversible red-ox conversion of Cu2O to CuO on the film surface, in addition to the reversible red-ox reaction of
the Cu2O film. X-ray photoelectron spectroscopy indicated that the copper oxide nanoparticles accumulated on the film
surface, after annealing the samples at 200°C. Surface morphology of the films and particle size of the surface copper
oxides have also been studied by atomic force microscopy analysis. The copper oxide nanoparticles with average size of
about 100Ânm increased the surface area ratio and surface roughness of the silica films from 2.2% and 0.8Ânm to 51% and
21Ânm, respectively.},
Doi = {10.1016/j.tsf.2009.05.016},
File = {Akhavan_2009.pdf:CopperOxides\Akhavan_2009.pdf:PDF},
ISSN = {0040-6090},
Keywords = {Copper oxides, Nanoparticles, Sol-gel deposition, Optical properties, Band-gap energy, Electrochromic
properties },
Owner = {Francesco},
Timestamp = {2010.03.08}
}

which has an unescaped % in the abstract field, the % is ignored and the rest of the line is passed through. I don't mind this one bit (it preserves more of the text the author has clearly intended), but technically, the % means ignore input up to and including newline.

My own parser didn't support comments and read these as equivalent to \% -- not saying you should do that necessarily.

retorquere commented 7 years ago

Wait -- something else is happening. It looks like things between bare % signs are interpreted as variables?

johanneswilm commented 7 years ago

Yes, the source code contains this line:

// Using % as a delimiter for variables as they cannot be used in regular latex code.

So maybe we should be using another sign?

johanneswilm commented 7 years ago

Any character that should be safe?

johanneswilm commented 7 years ago

Or we could do this:

  1. Make sure that everything beyond a %-character is removed from the input in the 1st parse, including newline.

  2. Keep % for variable-handling.

retorquere commented 7 years ago

I don't know about safe chars, but it seems unlikely -- as far as I know, any character that's not a command is interpreted as text input. How about a command, something like \var{....}? What do you use it for BTW? Variable interpolation? For that I usually see constructs such as these.

I don't actually mind the current behaviour much though. If people want a % they should be using \%.

johanneswilm commented 7 years ago

In the first parse I need to output simple text. In the second run it is then transformed into the JSON structure you see in the output. In the first round it tries to resolve any variables. It is only those variables that are not defined anywhere that are being left in the text.

So the %-character should be safe, if we were following the LaTeX-rules of really removing everything after bare %-characters in the first round. That seems to be the real bug here.

retorquere commented 7 years ago

If you really want to follow the LaTeX intent, the % parsing would also be cut short by newlines.

In any case, this is mostly a case of "what is helpful to you". I'm perfectly OK with the current behaviour.

As an aside, #59 (for which I have a workaround) and #60 (for which I'm happy to adjust my test cases) are AFAICT the last two issues that touch BBT -- with the workaround (#59) and the adjusted test cases (#60), all my import tests pass!

johanneswilm commented 7 years ago

Ok, yeah let's try to get this right.

I was trying to use this entry in latex with biblatex and the %-part made it not compile. The newline character was however ignored. I wonder if the latex-rules for % maybe work differently in bibtex?

@article{anand2014technicalreportformally,
  author = {Anand, Ab\mkbibbold{hishe}k and Rahli, %ignore this
  Vincent},
  date = {2014},
  note = {http://www.nuprl.org/html/verification/},
  origdate = {1856},
  langid={english},
  language ={smålandsk and german},
  origtitle = {An {{Old}} Title},
  timestamp = {2011.03.22},
  title = {Technical {{Report}} : {{Towards}} A {{Formally Verified Proof Assistant}}},
  url = {http://www.nuprl.org/html/verification/}
}
retorquere commented 7 years ago

You are right, % interpretation is not what I thought inside references. This renders without error but omits the lastname of Rahli when the % is present:

@article{ck1,
  author = {Anand, Abhishek and % what?
  Rahli, Vincent},
  date = {2014},
  note = {http://www.nuprl.org/html/verification/},
  origdate = {1856},
  langid={english},
  language ={smålandsk and german},
  origtitle = {An {{Old}} Title},
  timestamp = {2011.03.22},
  title = {Technical {{Report}} : {{Towards}} A {{Formally Verified Proof Assistant}}},
  url = {http://www.nuprl.org/html/verification/}
}

I'd have to go ask @njbart about this.

johanneswilm commented 7 years ago

I tried putting the reference from https://github.com/fiduswriter/biblatex-csl-converter/issues/60#issue-253245156 in a bib-file and then run it through latex/biblatex, and it also crashed on me.

johanneswilm commented 7 years ago

When using biber, I cannot get your example in https://github.com/fiduswriter/biblatex-csl-converter/issues/60#issuecomment-325286111 to work either. Any type of bare-% usage inside of a a field or even just inside a citation makes the entire thing crash with biber. Did you use the bibtex executable instead?

retorquere commented 7 years ago

I use sharelatex.

johanneswilm commented 7 years ago

Ok, so maybe they have some error handler for this that automatically escapes percentage signs or removes them before handing it to biber/biblatex?

retorquere commented 7 years ago

Which would be odd -- I thought they would just pass the input to latex. The MWE is here, it's public so you should be able to access it.

johanneswilm commented 7 years ago

That changes the situation. If any kind of bare-%-usage makes bibtex crash, then we should probably always assume that they meant to escape the %-character.

retorquere commented 7 years ago

Agreed.

I can't reach Nick Bart (my go-to guy for deep biblatex knowledge) -- the email address I had of him bounces.

johanneswilm commented 7 years ago

The comments here seem to confirm that using %-ing for comments in bibtex isn't really working: https://tex.stackexchange.com/questions/93972/comment-out-sections-of-text-in-bib-file/93982#93982

retorquere commented 7 years ago

Then again -- it's bibtex. That program should have been superseded by biblatex aeons ago 😄 .

johanneswilm commented 7 years ago

Right, but I also get those errors with biblatex (biber).

retorquere commented 7 years ago

Which is surprising then given that stack exchange comment. Anyhow, I think we can agree that people should not be using bare % in references.

johanneswilm commented 7 years ago

@retorquere Could you check if the output is the way it should be? I am a bit unsure about the file-field and others where the percentage sign seems to be double-escaped. But most of the abstract fields seem to be correctly handled now.

johanneswilm commented 7 years ago

To be honest, I don't know what we are supposed to do with an unescaped % in a file field. This is incorrect latex/bibtex, correct? So should we escape it (as we do now)? Or what should we do?

retorquere commented 7 years ago

The file field is a bit of an odd duck. I think it's usually parsed as a verbatim field, but I don't know what the rules are for % signs in verbatim fields.

I've tested against current master and this looks perfect to me -- AAMOF, I'd be very happy to see this as a new NPM release.

johanneswilm commented 7 years ago

Ok, this is release 0.23. I suggest we create a 1.0 once you are happy with the final result and remove all the % TODO: verify output notes in the files.

retorquere commented 7 years ago

I found one test case I had commented out, sorry, I'll open a new issue. Other than that it looks perfect, I'm very happy with the results (and the cooperation for that matter).

retorquere commented 7 years ago

The same goes for unescaped ampersands BTW -- the parser currently passes them through, but people shouldn't be doing this anyway.

johanneswilm commented 7 years ago

ok. so it's ok as it is? I think I also saw some cases of &. Should we be dealing with those?

retorquere commented 7 years ago

It's OK as-is AFAIC -- it was just a heads-up. I'm conflicted about things like   (which I just found in one of my tests) and & (which I will probably find later) -- they really ought not to be there, but it's easy enough to remedy post-parse should I change my mind about handling them.