ali-ramadhan / DocumenterCitations.jl

DocumenterCitations.jl uses Bibliography.jl to add support for BibTeX citations and references in documentation pages generated by Documenter.jl.
https://ali-ramadhan.github.io/DocumenterCitations.jl/dev
MIT License
65 stars 10 forks source link

Convert common TeX to unicode #11

Open simonbyrne opened 4 years ago

simonbyrne commented 4 years ago

I set it up here, and noticed a few TeX artifacts.

I would suggest at least converting two dashes (--) to an en-dash () as those are very common and hard to type.

ali-ramadhan commented 4 years ago

PR #17 starts working on this (see below). I'll tag v0.1.1 or v0.2.0 shortly after it's merged.

@simonbyrne Do you know of a list of these kinds of replacements or should we just add to it as we go along?

I was able to find lists for math TeX to unicode (https://github.com/svenkreiss/unicodeit/blob/master/unicodeit/data.py) but not so much for text replacements.

image

simonbyrne commented 4 years ago

I agree we probably don't want the full unicodeit list, as it seems to include both math and text commands

charleskawczynski commented 3 years ago

Reopening this. We're still seeing issues, e.g. Jo { \~a } o Teixeira, over at the ClimateMachine.jl refs.

simonbyrne commented 3 years ago

This is due to spurious spaces being inserted by the BibTeX parser. Upstream issue is https://github.com/Azzaare/BibParser.jl/issues/5.

charleskawczynski commented 3 years ago

Should we leave this open until the upstream is closed?

simonbyrne commented 3 years ago

Yes, probably a good idea.

Azzaare commented 3 years ago

Hi there! Sorry for the long wait, spurious braces should not be a problem anymore.

It might only be a crude parser that I handcrafted, but BibParser.jl got updated today (v0.1.11) (the new parser should handle any valid BibTeX entry, but do not replace LaTeX commands from a @preamble nor converts LaTeX to Unicode)

Azzaare commented 3 years ago

I've created a GitHub repo to convert LaTeX ⇋ Unicode: https://github.com/Humans-of-Julia/LaTeXUniCode.jl It is almost empty at the moment, but I will work on it during summer (as I will be in between two jobs, I can have some fun!)

Anyway, if some of you want to join, you're all welcome aboard.

fingolfin commented 3 years ago

The function tex2unicode is there, but it does not seem to be applied to pages, which is where I see them most often:

pages = {1 -- 45},

Could this be done?

LazyScholar commented 3 years ago

tex2unicoe is currently only applied to title? https://github.com/ali-ramadhan/DocumenterCitations.jl/blob/886bbb740ea2f814ec67e321e61ae16149e58fc2/src/bibliography.jl#L50-L54 Or am i misunderstanding it?

fingolfin commented 3 years ago

No you are right. Hmm, I thought I'd made a PR also applying it to the output of xin... guess I forgot :/

LazyScholar commented 3 years ago

I converted all my .bib files to Unicode therefore i did nor realize that applying it to the authors and maybe published_in might fix it for others.

@fingolfin do you want to make the PR (you can delete line 51 as with your last change the year is not needed any more)?

LazyScholar commented 3 years ago

Reopening this. We're still seeing issues, e.g. Jo { \~a } o Teixeira, over at the ClimateMachine.jl refs.

@charleskawczynski Is { \~a } valid Tex? As far as i know in order to get ã one have to use \~{a} or even \tilde{a} (not sure if the later one is supported by bibtex). Source: https://en.wikibooks.org/wiki/LaTeX/Special_Characters#Escaped_codes