Add internal zotero api endpoint for import / export commands

cboulanger commented 2 years ago

In my research, I am text mining in a large body of scholarly literature to extract citation data from it, which I would like to store in Zotero via the cita plugin. As far as I can see (correct me), importing and exporting cita data is only possible manually via the GUI. This is impractical in largely automated workflows.

Describe the solution you'd like I would be great if Cita registered endpoints for each of its commands in the Zotero Connector Server as described here: https://www.zotero.org/support/dev/client_coding/connector_http_server . This would allow POSTing to these endpoints to automatically import data from other workflows, and GETting exported data from the plugin in standardized formats.

Describe alternatives you've considered The available alternative is to go via the zotero server API and to manipulate the notes storing cita data directly. This isn't ideal because it is slow, error-prone, and breaks whenever cita changes its internal data format.

Expected workload I expect that the implementation of this feature is relatively straightforward, since in theory at least it simply requires registering the endpoints and passing http data to the already implemented methods and vice versa.

Dominic-DallOsto commented 2 years ago

Just to check - what kind of data would you need to store for each citation? Are the citations in Cita already sufficient to store everything you want (type, title, year, authors, DOI, ...)?

Would it work for you to import/export data to BibTeX (or another Zotero-supported format) like with the functionality that was added in this PR #118? This currently only works for a single item at a time, so you would need to send a request for each citing item.

This should be fairly straightforward to implement though. The only problem I foresee is that the export functionality currently uses Zotero's GUI to handle the selection of file format and options. So we'd have to check whether these options can be specified and we can export without the GUI.

cboulanger commented 2 years ago

@Dominic-DallOsto Thanks for looking into this!

Yes, the current metadata would be sufficient, since I store each cited reference in Zotero in any case (even if it is incomplete/not identifiable - I deal with a lot of older scholarly literature that does not have DOIs/ISBNs etc.), so I can refer to its Zotero ID.

BibTeX would be totally fine - in fact, the text mining software I use (https://github.com/exciteproject/Exparser) exports the found citations in BibTeX, so a seamless workflow can be established.

I care most for importing at the moment, so if you could implement that first and deal with export later, that would be totally fine with me.

Dominic-DallOsto commented 2 years ago

Yes, the current metadata would be sufficient, since I store each cited reference in Zotero in any case (even if it is incomplete/not identifiable - I deal with a lot of older scholarly literature that does not have DOIs/ISBNs etc.), so I can refer to its Zotero ID.

Ok. I think if you have all the cited items in Zotero, the auto linking will find them and add the Zotero ID of these items to the citation (the Z icon will go red showing it's linked). I don't think these survive the export process yet, but if that'd help I can look into fixing this later.

cboulanger commented 2 years ago

Ok. I think if you have all the cited items in Zotero, the auto linking will find them and add the Zotero ID of these items to the citation (the Z icon will go red showing it's linked). I don't think these survive the export process yet, but if that'd help I can look into fixing this later.

The way I envision the workflow is that instead of talking to the Zotero server as I do now, my Exparser-to-Cita bridge will import the cited item as BibTex via the existing connector import endpoint first (if they don't exist yet), and then use the to-be-implemented cita endpoint to add the cited items to the citing item via its Zotero ID.

It would definitely help if the Zotero items could be fully exported (not just the Cita Metadata). Export is another very important topic that probably deserves its own issue, since ideally, export formats should include a) a format that can be processed in bibliometric/scientometric tools (such as the (bad) WoS Export Format) or at least can be easily converted into formats they understand and/or b) a format that has some potential to be a future standard (such as https://sparontologies.github.io/cito/current/cito.html).

Dominic-DallOsto commented 2 years ago

Ok, gotcha. So given you already import all the items via BibTeX into Zotero, an automated version of #39 would work for you - adding cited items to a citing item by a list of Zotero IDs? That should be easy enough.

Exporting a list of cited items to BibTeX seems to contain all the data you would get by exporting the same Zotero items to BibTeX (ie. everything you see in the note) - even though this info doesn't appear in the citation editor. It's also possible to export all the items to Citation graph format which contains the links between all the Zotero IDs, but I guess you'd already have this in the first place.

Maybe I'm not exactly understanding, but given you have

BibTeX data for a list of items
A graph of which items are cited by which others

Are you just using Zotero for storage / ease of data navigation? Or do you want to sync/compare data with Wikidata/CrossRef? Or to export in a specific format for some other software? I'm not sure otherwise what you're gaining by importing this into Zotero?

cboulanger commented 2 years ago

Are you just using Zotero for storage / ease of data navigation? Or do you want to sync/compare data with Wikidata/CrossRef? Or to export in a specific format for some other software? I'm not sure otherwise what you're gaining by importing this into Zotero?

That's a good question! I could easily go directly from reference extraction to storing this data in a graph database. However, I need to store the full metadata of citing and cited items in a way that makes it possible to edit the data in a domain-specific way (as opposed to a generic relational or graph database) and to connect it to other services that do bibliographic-y things - Zotero is perfect for this, given its extensibility (CITA is a great example). So what I do is to import everything (including the PDFs) in Zotero first, make a copy of the Zotero data in a NoSQL-database (couchbase) to compensate for its abysmal query features, and then update a network graph in a graph database which works with a minimal set of metadata that refers to the full entry in Zotero. A bit convoluted, but the best I could come up with. I'd rather have the "original" data in Zotero and then create ephemeral copies of it for analysis.

retorquere commented 2 years ago

Errr... while I'd have preferred a nosql db myself, there were very good reasons for choosing sqlite, and you have all the power of sql in Zotero. You don't need to use the search api.

cboulanger commented 2 years ago

Errr... while I'd have preferred a nosql db myself, there were very good reasons for choosing sqlite, and you have all the power of sql in Zotero. You don't need to use the search api.

That's true, but only if you have access to localhost and you need to run a local Zotero client. You cannot create automated workflows that run cross different machines, for example (which was my original goal, given that I use all kind of cloud services). But given that I will put CITA into the workflow, this changes my focus and I will look into sqlite - much faster than querying zotero.org directly in any case.

retorquere commented 2 years ago

Oh you mean search via API. I stand corrected. That's abysmal, and you do need a mirror to do anything of interest.

Dominic-DallOsto commented 2 years ago

I just made #148 to test out a solution for importing citations via api - here's a debug build if you want to try.

cboulanger commented 2 years ago

@Dominic-DallOsto Cool, thanks! I'll try that out ASAP!

cboulanger commented 2 years ago

This has been implemented as a separate plugin now, see https://github.com/Dominic-DallOsto/zotero-api-endpoint Closing.

cboulanger commented 2 years ago

Oh wait, sorry, I closed this in error. The issue is about Cita import/export, I am referring to a more general zotero API endpoint solution. Re-opening.

diegodlh / zotero-cita

Add internal zotero api endpoint for import / export commands #144