cpitclaudel / biblio.el

Browse and import bibliographic references from CrossRef, DBLP, HAL, arXiv, Dissemin, and doi.org from Emacs
GNU General Public License v3.0
180 stars 14 forks source link

non-latin1 characters when importing from CrossRef #19

Closed dahtah closed 6 years ago

dahtah commented 6 years ago

Thanks for the great package. I've noticed a problem when importing BibTex references from CrossRef, namely that page range (e.g. 467-499) is encoded with an em-dash character rather than just a dash, which depending on the encoding causes pdflatex to complain. Would it be possible to replace the em-dash with a dash?

cpitclaudel commented 6 years ago

I can think of a few options:

One easy solution could be to add \DeclareUnicodeCharacter{2013}{--} to your preamble (or move to xelatex or lualatex, since they have great unicode support).

Another solution would be to request this feature in Emacs itself, as bibtex-clean-entry already supports a page-dashes option that converts -- to -.

For just en dashes, you could use advice to add this on your side:

(defun ~/remove-en-dashes (str)
  (replace-regexp-in-string "–" "--" str t t))

(with-eval-after-load 'biblio-core
  (advice-add #'biblio-format-bibtex :filter-return #'~/remove-en-dashes))

If the problem is more general, I can look into integrating the stuff I did for esh (https://github.com/cpitclaudel/esh/blob/master/esh-latex-escape.el) into this package.

dahtah commented 6 years ago

Thanks Clément, that's quite helpful. Solution (1) works, of course. For others who might run into the problem, though, maybe it's worth calling bibtex-clean-entry before exporting from CrossRef?

On 20/02/2018 15:29, Clément Pit-Claudel wrote:

I can think of a few options:

One easy solution could be to add |\DeclareUnicodeCharacter{2013}{--}| to your preamble (or move to xelatex or lualatex, since they have great unicode support).

Another solution would be to request this feature in Emacs itself, as |bibtex-clean-entry| already supports a |page-dashes| option that converts |--| to |-|.

For just en dashes, you could use advice to add this on your side:

(defun ~/remove-en-dashes (str) (replace-regexp-in-string "–" "--" strt t))

(with-eval-after-load 'biblio-core (advice-add #'biblio-format-bibtex :filter-return #'~/remove-en-dashes))

If the problem is more general, I can look into integrating the stuff I did for esh (https://github.com/cpitclaudel/esh/blob/master/esh-latex-escape.el) into this package.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cpitclaudel/biblio.el/issues/19#issuecomment-366994280, or mute the thread https://github.com/notifications/unsubscribe-auth/ADEJENv7oh1IYsQfn-sv6sieCpNkSZN4ks5tWtbUgaJpZM4SL6ic.

cpitclaudel commented 6 years ago

maybe it's worth calling bibtex-clean-entry before exporting from CrossRef?

I think this is already done, actually. Am I misunderstanding?

dahtah commented 6 years ago

No, I'm the one who misunderstood. Thanks, marking this as fixed (but maybe worth documenting somewhere?)