emacs-citar / citar

Emacs package to quickly find and act on bibliographic references, and edit org, markdown, and latex academic documents.
GNU General Public License v3.0
497 stars 54 forks source link

Open PDF in Zotero #685

Open dlukes opened 2 years ago

dlukes commented 2 years ago

Zotero has recently acquired a very nice PDF reader and annotator, which can be invoked via the command line. So I started investigating whether I can make citar use it.

Turns out -- yes, see below. I'm still posting this here because this was the first place I looked to see whether anyone had already figured it out, so I'm hoping it might help other people who want to achieve something similar. Additionally, it might be a source of inspiration to rework some of citar's internals to make this type of setup a bit easier (basically, all that's needed is a some special-casing for URIs starting with zotero://...). But if that's something none of the maintainers consider worth their while at this point, that's perfectly fine too, feel free to close the issue right away :)

Big picture

The Zotero PDF reader is activated via URLs of the general form zotero://open-pdf/library/items/<ZOTERO_KEY>, which you shoud be able to just (xdg-)open on the command line. So we need to:

  1. generate such URLs during export from Zotero
  2. sheperd them safely through citar's attachment files machinery

Zotero side

I'm using Better BibTeX for the exports, because it allows you to customize the export process. You'll need to add a custom script to the export process via the postscript setting:

image

Here's the full script for copy-paste, with explanatory comments:

// I'm using this with BetterCSLJSON, but BetterBib(La)TeX is also possible, see
// https://retorque.re/zotero-better-bibtex/exporting/scripting/.
if (Translator.BetterCSLJSON) {
  entry.file = item.attachments.map(
    a => (
      // If this is a PDF attachment...
      /.pdf$/i.test(a.localPath) ?
        // ... generate a link to open the PDF in Zotero, abusing the query string to
        // provide a human readable file name like so:
        // zotero://open-pdf/library/items/RANDOM_ZOTERO_KEY?human-readable-name.pdf
        `zotero://open-pdf/library/items/${a.key}?${a.localPath.split(a.key)[1].substring(1)}`
        // ... otherwise, just return the normal file path.
        : a.localPath
    // Escape \'s and ;'s to make the individual items play nice with
    // citar-file--parser-default and citar-file--split-escaped-string.
    ).replace(/([\\;])/g, "\\$1")
  ).join(";")
}

Citar side

Citar performs various checks and processing on the file paths associated with bibliography items, which lead to the zotero:// URLs being discarded or mangled. So these modifications need to be disabled (note that I'm using Doom Emacs; in vanilla Emacs, translate to two calls to advice-add, I think):

(defadvice! dlukes/citar-file-trust-zotero (oldfun &rest r)
  "Leave Zotero-generated file paths alone, especially zotero://..."
  :around '(citar-file-open citar-file--find-files-in-dirs)
  (cl-letf (((symbol-function 'file-exists-p) #'always)
            ((symbol-function 'expand-file-name) (lambda (first &rest _) first)))
    (apply oldfun r)))

And obviously, we'll need to tell citar to open PDFs in an external app (again, Doom Emacs):

(after! citar
  (add-to-list 'citar-file-open-functions '("pdf" . citar-file-open-external)))

(As an aside: thank you very much for citar!)

bdarcus commented 2 years ago

So I've not yet looked at this closely, but some quick thoughts.

  1. a number of us, including me, use Zotero, so definitely this is interesting.
  2. I think the technical question on our end is just how best, most generally, to handle this. Your suggestion may indeed be the answer, but any thoughts @aikrahguzar @roshanshariff?
  3. Do you think that BBT script might be general enough to add to BBT itself?
  4. We do have a wiki for ideas like this; feel free to add it there when it makes sense.
dlukes commented 2 years ago

Do you think that BBT script might be general enough to add to BBT itself?

That would be nice, but for CSL JSON specifically, the trouble is there's no standard field for putting the attachments AFAIK. And coming up with a non-standard field has backwards compatibility implications. In the course of researching this, I think I saw @retorquere make this argument against including non-standard fields in Better CSL JSON exports, but I can't find the reference at the moment. So that's a potential blocker.

(If/when a standard CSL JSON solution emerges -- an attachments field? -- I'm guessing it won't be a single string with multiple values separated by ;, but an array?)

For Better Bib(La)TeX, all that's needed is an option in the existing exporters to generate zotero://open-pdf/... URLs for PDFs instead of regular paths when exporting files, I think.

We do have a wiki for ideas like this; feel free to add it there when it makes sense.

Thank you! I'll wait a bit to see where the discussion goes first, but I'll keep it in mind.

bdarcus commented 2 years ago

I'm not necessarily advocating for this (it was just a question), but note that CSL-JSON now has a "custom" property that could be used to dump non-standard data like this. Am not sure what Zotero does with that on import though.

retorquere commented 2 years ago

Do you have a sample with these custom fields?

bdarcus commented 2 years ago

Yes, there are examples embedded in the schema.

          {
            "short_id": "xyz",
            "other-ids": ["alternative-id"]
          }
retorquere commented 2 years ago

Zotero doesn't do anything with those.

bdarcus commented 1 year ago

Zotero doesn't do anything with those.

I should have asked this earlier, but now that I'm looking at it again:

"Nothing" in this context means "throws out these data, but doesn't cause an error"?

Maybe that's sufficient?

retorquere commented 1 year ago

Yeah, no action, no error.

bdarcus commented 1 year ago

Might be cool, then, for BBT and/or Zotero itself (cc @dstillman) to use that new custom property for exporting this kind of thing as a first step.

retorquere commented 1 year ago

We'd still have to establish what the custom properties would look like.

Is CSL really the best vehicle for this?

bdarcus commented 1 year ago

IDK; just seems the primary export format for Zotero (edit: and more generally has picked up traction beyond it).

But certainly it need not stay that way.

retorquere commented 1 year ago

If the custom fields appear in Zotero, BBT CSL formats will inherit them automatically, so that would kill two birds with one stone.

bdarcus commented 1 year ago

@dlukes - FYI, a few weeks ago I merged #736 with an option to open a Zotero entry via citar-open-entry.

Here's that function:

https://github.com/emacs-citar/citar/blob/ed53e67bee517ae37198a10b40515201460b87f4/citar.el#L1514-L1520

The basic idea could be extended for files. But the current ability to open the entry should still be useful for now.

And yeah, the current file opening functions would need some adjustment.

@retorquere - is it yet possible to open PDFs using the "select" links?

retorquere commented 1 year ago

No, the relevant change hasn't (to my knowledge) been made to Zotero yet.

bdarcus commented 1 year ago

Is the issue using the citekeys to open the PDFs, or opening the PDFs at all?

Like, should this work?

zotero://open-pdf/library/items/TPUJP37W

It doesn't for me, but that doesn't necessarily mean anything.

dlukes commented 1 year ago

FYI, a few weeks ago I merged https://github.com/emacs-citar/citar/issues/736 with an option to open a Zotero entry via citar-open-entry.

Thanks, I happened to notice through a lucky coincidence quite soon after the merge and configured it as my citar-open-entry-function :)

It doesn't for me

It does for me -- running xdg-open zotero://open-pdf/library/items/TPUJP37W opens the associated PDF in Zotero.

retorquere commented 1 year ago

The problem is that Zotero has two functions (Zotero.API.getResultsFromParams and Zotero.DataObjects.prototype.parseLibraryKey(Hash)) that translate the TPUJP37W part to an item ID without being given any context. I don't know what in the URL precedes it (technically, I don't even know where I'm called from, whether these url-handling calls or entirely different places where these functions are called), so I don't know whether you are opening an item or an attachment, and when you send zotero://open-pdf/library/items/@citekey, I don't know whether I should translate that to the item itself or to one of the attachments. Zotero said they would at some point add this context, but I don't see in the code that this has happened yet.

dlukes commented 1 year ago

running xdg-open zotero://open-pdf/library/items/TPUJP37W opens the associated PDF in Zotero.

Oh right, sorry, I forgot part of the context here -- it doesn't work for opening the associated PDF of TPUJP37W as a parent item; it works if TPUJP37W is already the ID of a PDF item. Sorry for the confusion.

bdarcus commented 1 year ago

Oh right, sorry, I forgot part of the context here -- it doesn't work for opening the associated PDF of TPUJP37W as a parent item; it works if TPUJP37W is already the ID of a PDF item. Sorry for the confusion.

That is explains why it didn't work for me. It makes sense though.

... so I don't know whether you are opening an item or an attachment, and when you send zotero://open-pdf/library/items/@citekey, I don't know whether I should translate that to the item itself or to one of the attachments.

So really we'd need to find whatever PDFs are associated with the citekey-identified Zotero parent, and list those as discrete (Zotero) "file" resources in citar-open-*?

In that case BBT, wouldn't do the interpretation of what to open; the user would choose which one here.

Perhaps, then, best to include the Zotero item ID in the bibtex etc file?

Or, per the OP, include the actual select links to them in files?

EDIT: maybe we refactor a bit so default files use file://, and the checks only apply to those, but allow other schemes?

I guess, then, we might retitle the issue something like "Allow non-file URL schemes for library files"?

retorquere commented 1 year ago

BBT could offer a json-rpc call that translates a citekey to the item + attachments. The caller can then open the attachment using the native zotero IDs in the url.

bdarcus commented 1 year ago

That sounds perfect @retorquere.

retorquere commented 1 year ago

Does this do what you need if you give it the betterbibtex json translator? https://retorque.re/zotero-better-bibtex/exporting/json-rpc/#itemexportcitekeys-translator-libraryid

bdarcus commented 1 year ago

What does the relevant returned JSON look like?

A list of items, each of which has a list of attachment Zotero IDs (and type and such)?

E.g. we could generate a list of zotero select URIs from it directly, for each attached PDF?

If yes, yes!

bdarcus commented 1 year ago

@dlukes - it might be that we could do a little citar-zotero thing here, relying on the json-rpc, and bundle that functionality there? Sort of an analog to citar-file.

There is a built-in jsonrpc package, though I haven't yet got it working.

If you feel like giving a go at a PR, let me know.

Otherwise, I'll take a look when I get a chance.

EDIT: there are other emacs zotero packages, but they seem to focus on the Zotero web API?

bdarcus commented 1 year ago

With this (two attachments, one HTML and one PDF), how do I know which are the PDFs, other than looking at the file extensions?

❯ curl http://localhost:23119/better-bibtex/json-rpc -X POST -H "Content-Type: application/json" -H "Accept: application/json" --data-binary '{"jsonrpc": "2.0", "method": "item.attachments", "params": {"citekey": "toly2017"} }'
{"jsonrpc":"2.0","result":[{"open":"zotero://open-pdf/library/items/PKZ88BS7","path":"/home/bruce/Zotero/storage/PKZ88BS7/14747731.2016.html"},{"open":"zotero://open-pdf/library/items/4XRNSQEB","path":"/home/bruce/Zotero/storage/4XRNSQEB/Toly_2017_Brexit, global cities, and the future of world order.pdf","annotations":[]}],"id":null}⏎  
bdarcus commented 1 year ago

Does this do what you need if you give it the betterbibtex json translator?

What's the right value for the translator property? jzon?

retorquere commented 1 year ago

jzon will do it, as will BetterBibTeX JSON.

retorquere commented 1 year ago

What does the relevant returned JSON look like?

A list of items, each of which has a list of attachment Zotero IDs (and type and such)?

I thought it did, but they're stripped out. I'll see what I can do about that.

dlukes commented 1 year ago

Sorry, evening routine with the kids, then fell asleep trying to get the older one to hit the hay :)

maybe we refactor a bit so default files use file://, and the checks only apply to those, but allow other schemes?

Sounds good to me!

it might be that we could do a little citar-zotero thing here, relying on the json-rpc, and bundle that functionality there?

I think that's a good option too, although it feels somewhat more complicated than just having the URLs listed in the exported bibliography file, especially since Citar will still need to have that file anyway. But I understand the reticence to add nonstandard fields to the CSL export willy-nilly.

From a performance perspective, just to make sure I'm understanding this correctly -- would this mean that whenever I search my bibliography, Citar would initiate either one JSON-RPC call exporting all the citekeys in the bibliography with item.export, or multiple JSON-RPC calls getting item.attachments for each citekey?

How fast/slow is either of these expected to be? It also seems slightly wasteful to redo this each time, so some sort of caching should probably be involved, which is of course notoriously tricky to get right. Whereas if this information was part of the exported bibliography file, then all of this would be implicitly handled by just keeping track of whether the file needs to be reloaded, which Citar already does.

If you feel like giving a go at a PR, let me know.

I've got a lot on my plate right now, and I'm not particularly comfortable in Elisp. Plus the performance worries I detailed above. But if it turns out they're unfounded, I might try and cobble something together at some point. If I do, I'll post here to save on duplicate work.

dlukes commented 1 year ago

would this mean that whenever I search my bibliography, Citar would initiate either one JSON-RPC call exporting all the citekeys in the bibliography with item.export, or multiple JSON-RPC calls getting item.attachments for each citekey?

I ran a simple test of both these options:

```python import time import json from pathlib import Path import httpx with Path("~/.cache/zotero/My Library.json").expanduser().open("rb") as file: bib = json.load(file) citekeys = [item["citation-key"] for item in bib] def item_attachments(): for citekey in citekeys: httpx.post( "http://localhost:23119/better-bibtex/json-rpc", json={ "jsonrpc": "2.0", "method": "item.attachments", "params": {"citekey": citekey}, }, timeout=None, ) def item_export(): httpx.post( "http://localhost:23119/better-bibtex/json-rpc", json={ "jsonrpc": "2.0", "method": "item.export", "params": {"citekeys": citekeys, "translator": "jzon"}, }, timeout=None, ) print(f"My Library currently has {len(citekeys)} items.") for test in (item_attachments, item_export): print(f"Timing {test.__name__}...") start = time.perf_counter() test() elapsed = time.perf_counter() - start print(f" -> Ran in {elapsed:.2f} seconds.") ```

And unless I misunderstood or I'm doing something wrong, I'm afraid it looks unworkable from a performance standpoint:

My Library currently has 773 items.
Timing item_attachments...
  -> Ran in 75.50 seconds.
Timing item_export...
  -> Ran in 44.38 seconds.
bdarcus commented 1 year ago

Oh yeah, performance could be an issue; forgot about that.

The indicators are indeed generated dynamically, to ensure they're accurate.

If the select links are stored in the exported file, then they're also cached in citar.

retorquere commented 1 year ago

I thought it did, but they're stripped out. I'll see what I can do about that.

v6.7.68 will drop in 10 minutes or so, and that has a straight dump of the Zotero objects. All keys are in there, but it also has all the URIs ready to go.

bdarcus commented 1 year ago

Hmm ... when I use the jzon translator value, it works as expected.

But if I do this, it appears to hang; like will not complete after a minute or so, at which point I cancel.

❯ curl http://localhost:23119/better-bibtex/json-rpc -X POST -H "Content-Type: application/json" -H "Accept: application/json" --data-binary '{"jsonrpc": "2.0", "method": "item.export", "params": {"citekeys": ["toly2017"], "translator": "BetterBibTeX JSON" }}'
retorquere commented 1 year ago

A new release is building that fixes that.

tbdcit commented 4 months ago

Is there any progress on this issue or workaround?

I have added

if (Translator.BetterCSLJSON) {
entry.file = item.attachments.map(
        a => (
        /.pdf$/i.test(a.localPath) ?
        `zotero://open-pdf/library/items/${a.key}?${a.localPath.split('/').pop()}`
        : a.localPath
        ).replace(/([\\;])/g, "\\$1")
    ).join(';')
}

to the BB export which successfully adds the zotero links to the exported json. I am struggling to get citar to open these links though. I am not using doom and I am not sure how to write an equivalent advise to the one suggest in the first post.

Any suggestions? If I can get something working I would be happy to add this to the Wiki.

bdarcus commented 4 months ago

@tbdcit - No progress. I haven't really looked at this myself since my last comment.

~But that advise in the OP is not specific to doom.~

Edit: actually, it is specific to doom; sorry.