cabo / kramdown-rfc

An XML2RFC (RFC799x) backend for Thomas Leitner's kramdown markdown parser
MIT License
195 stars 83 forks source link

DOI workarounds for Usenix citations at ACM? #199

Closed dkg closed 1 year ago

dkg commented 1 year ago

Usenix papers are published by the ACM, and associated with a DOI reference. for example, E-Fail is described in https://dl.acm.org/doi/10.5555/3277203.3277245 -- however, a kramdown reference to this DOI number fails because apparently https://doi.org/10.5555/3277203.3277245 doesn't know about it.

I don't know enough about what's going on between DOI and ACM here, but both webservices should be able to produce bibtex.

i think DOI 10.5555 might be a "cross-reference" aggregator, maybe the 3277203 segment refers to ACM? I don't really know.

It would be great if kramdown-rfc could more easily cite usenix papers.

cabo commented 1 year ago

It doesn't seem to me that dl.acm.org has an interface like dx.doi.org where I can get application/citeproc+json by setting an Accept header to that. If you know how to get application/citeproc+json out of dl.acm.org, I could simply switch sites, reacting on the apparently unofficial 10.5555 that ACM uses.

dkg commented 1 year ago

I think you'd want to special-case ACM on the entire 10.5555/3277203 prefix, not just on 10.5555/ itself, as 10.5555/ seems to be a more generic cross-reference. For example, this article from Science claims to be doi:10.5555/article.2480580 (though https://doi.org/10.5555/article.2480580 doesn't know about it either).

The ACM interface i found for exporting bibtex is this one:

https://dl.acm.org/action/exportCiteProcCitation?targetFile=custom-bibtex&format=bibTex&dois=10.5555%2F3277203.3277245

Note that the DOI itself is percent-encoded, the / is replaced with %2F. It seems to work with both GET and POST methods.

The resulting json dict contains an items member that is a list of dicts, where each dict is a bibtex entry, i think. I haven't tried to figure out how to specify multiple DOIs in a single query, but the fact that the parameter is named dois and that the response is structured in a way that it could return more than one item makes it look like that's possible.

cabo commented 1 year ago

Hmm, I don't see how to get that JSON object from the URL you gave, except through a browser. Do you have a more API-like interface?

Besides, BibTeX is a terrible input format for a conversion. I'd rather use actual citeproc (hey, that's actually in the name!), or refer ("endnote"). Further hints appreciated.

dkg commented 1 year ago

hm, i just fetched it wget and had no problem. what problem did you see?

It looks like you can vary the targetFile and format parameters to replace bibTex with endNote but i'm not sure how much that changes the data. it looks like it just modifies the style and suffix members of the top-level dict, without changing the JSON in items at all. So maybe what's in items isn't bibtex, and it's some custom dl.acm.org format?

cabo commented 1 year ago

Hmm, wget gives me an empty file. There is some weird cookie processing going on.

cabo commented 1 year ago

This is what my citeproc processor gets out of a copy/paste from a browser showing https://dl.acm.org/action/exportCiteProcCitation?targetFile=custom-bibtex&format=bibTex&dois=10.5555%2F3277203.3277245 (it indeed seems to give me valid citeproc when asking for bibtex):

seriesinfo:
  Proceedings of the 27th USENIX Conference on Security Symposium: pp. 549–566
  DOI:
title: 'Efail: breaking S/MIME and OpenPGP email encryption using exfiltration channels'
author:
- name: Damian Poddebniak
  ins: D. Poddebniak
- name: Christian Dresen
  ins: C. Dresen
- name: Jens Müller
  ins: J. Müller
- name: Fabian Ising
  ins: F. Ising
- name: Sebastian Schinzel
  ins: S. Schinzel
- name: Simon Friedberger
  ins: S. Friedberger
- name: Juraj Somorovsky
  ins: J. Somorovsky
- name: Jörg Schwenk
  ins: J. Schwenk
date: '2018-08-15'

Not bad. What is missing? (The publisher is suppressed as there is a container-title.)

cabo commented 1 year ago

I sometimes, but not always, get the JSON with curl -iL (same command line, different tries). This is really weird. Totally Heisenberg.

cabo commented 1 year ago

It also works for 10.5555/206746 - if it works. (I need to add the ISBN.) Part of the error message when it doesn't work is:

We use cookies to ensure that we give you the best experience on our website.

[Learn more](https://www.acm.org/privacy-policy)

It seems your browser doesn't support them and this affects the site functionality.
dkg commented 1 year ago

Weird that it's not working reliably. Maybe someone who is a member of ACM knows who the right person to talk to is? https://www.acm.org/about-acm/contact-us suggests that we could mail dl-feedback@acm.org to ask questions about the digital library. I'll try sending them mail with you in Cc.

cabo commented 1 year ago

Try 1.6.37.

I already contacted dl-support@acm.org, waiting for answer. I now simply make a superfluous request to get a cookie :-)

cabo commented 1 year ago

If you see any weirdness with reference generation, both for ACM and DOI DOIs, please indicate.

dkg commented 1 year ago

hm, is 1.6.37 published yet? the last commit i see on github is af4d61be1082442808c294171d1009e483ab6c0f, which still thinks it's 1.6.36

cabo commented 1 year ago

It is now, and I since have remembered to push 😊

cabo commented 1 year ago

No feedback, works for me, shipped -- closing now.

Documentation: Archived-At: https://mailarchive.ietf.org/arch/msg/rfc-markdown/C7imTpyUpSu7Jtnbj-pNCAR_b1Y