citation-file-format / ruby-cff

A Ruby library for manipulating CITATION.cff files.
Apache License 2.0
51 stars 15 forks source link

Support more citation styles #64

Open mfenner opened 3 years ago

mfenner commented 3 years ago

I suggest to support more citation styles, I think Crossref and DataCite (search.crossref.org and search.datacite.org) have a reasonable list of common citation styles plus bibtex (and RIS) that can be displayed in the UI without too much extra effort:

For this work I would use the citeproc-ruby gem and citationstyles.org citation style files. I am happy to do a pull request if that is the direction ruby-cffwants to go.

mfenner commented 3 years ago

For a more in-depth discussion of popular citation styles that make sense here I would ping @adam3smith, @zuphilip, or @AbeJellinek.

adam3smith commented 3 years ago

I think the list above in general is good and captures the most important styles and types of styles, though I don't much care for "Harvard" which is just a label of any author-date style. Maybe go with "Harvard - Cite them Right", the most commonly used such style in the UK (and one of the most downloaded styles at Zotero when they last provided some data on this). Since this already gives you a number of author-date styles, I'd use the Chicago (fullnote) version, not the author-date one.

hainesr commented 3 years ago

Hi @mfenner and all. I think it would be great to add more citation styles - and the more we can do with existing gems the better.

I wonder if we should add these sorts of things in such a way that they can be optional. The reason for this is that (in my conversations with GitHub folks so far) we'd like to keep the number of extra dependencies down as low as possible. GitHub is already fairly complex I would imagine! (That said, I have just managed to reduce the current number of dependencies of ruby-cff by one, so maybe we can add one in return.) I note that adding citeproc-ruby would add four new dependencies in total.

So I think this would be a good conversation to have with @arfon when he's back from leave.

Any help with the development of this tool would be appreciated!

mfenner commented 3 years ago

For me citationstyles.org and the various processors such as citeproc-ruby are the community default (e.g. used by many reference managers) and I wouldn't want to try to reproduce them with custom code. It is also stable and well-maintained code.

Happy to hear @arfon on this, and I can write a pull request until next Monday to more clearly see how this would change the code.

mfenner commented 3 years ago

Thanks @adam3smith.

hainesr commented 3 years ago

For me citationstyles.org and the various processors such as citeproc-ruby are the community default (e.g. used by many reference managers) and I wouldn't want to try to reproduce them with custom code. It is also stable and well-maintained code.

Yes, I absolutely agree. I have an idea of how to reduce the current gem dependencies of ruby-cff further, so hopefully that will also help.

mfenner commented 3 years ago

If we support a fixed list of citation styles, we can specifically import them instead of importing all 1000s of styles as a submodule.

hainesr commented 3 years ago

Yes. I also had a thought about generalizing the supported styles within ruby-cff and making them pluggable as well. The current 2 (BibTeX and APA-like) were added rather at speed to show the concept, and get it in GitHub rapidly, rather than in the finished way that I would usually go for.

arfon commented 3 years ago

I suggest to support more citation styles, I think Crossref and DataCite (search.crossref.org and search.datacite.org) have a reasonable list of common citation styles plus bibtex (and RIS) that can be displayed in the UI without too much extra effort:

I think the current design will probably support 2-3 more without having to do some kind of reworking of the UI, so I'd encourage us to initially keep this list shorter initially rather than introducing a new dependency on the GitHub design team.

That said, I agree ultimately we should be trying to use the existing libraries out there for CSL logic (e.g., citeproc-ruby).

If we support a fixed list of citation styles, we can specifically import them instead of importing all 1000s of styles as a submodule.

I'm assuming this would still introduce a new gem dependency here? As @hainesr alluded to, adding new dependencies to GitHub core is taken pretty seriously, and takes time for things such as security reviews.

mfenner commented 3 years ago

Thanks @arfon. Two additional styles should work with the current UI (three with small adjustments to the tab width), and I would suggest to add these two:

They are both popular, cover different style classes (numeric and author-date, respectively), and either are used mainly in engineering or are generic (APA comes from the psychology field).

I have made good progress with my PR, and I can add the two styles directly, so no need to use the csl-styles gem (which packages all styles into a Ruby gem), but adding citeproc-ruby as a dependency. I don't think you can do formatted citations without CSL or Citeproc, as there are many years of work you take advantage of, including painful things such as how to display author names (a surprisingly complex topic) or rich text such as italic or superscript in titles. Supporting IEEE and Harvard similar to the current APA implementation is certainly more work than using citeproc-ruby. We can of course ask the citeproc-ruby author @inukshuk what he thinks regarding dependencies and potential security issues.

citeproc-ruby is the default implementation in Ruby, and I think extracting the core functionality into ruby-cff would create other issues. including long-term maintainability. But the ultimate decision, including the timing is of course up to you. This page lists the open source and commercial applications using Citation Style Language (most of them not using the Ruby CSL processor), including popular reference managers Zotero, Mendeley, Papers and ReadCube.

inukshuk commented 3 years ago

I'd be more than happy to help land support for CSL via citeproc-ruby. Currently the implementation is spread across four Gems: citeproc, citeproc-ruby, csl, and namae. The latter is used for name parsing and could be made optional if the names in CFF are already tokenized sufficiently.

mfenner commented 3 years ago

Thank you @inukshuk. I am working on a pull request for a first citeproc-ruby implementation, using the "standard" approach. To show where I am going, I can post a WIP version no later than tomorrow morning. CFF has nice name tokenization.

mfenner commented 3 years ago

Now that I have Citeproc/CSL working locally, I noticed a few issues with the built-in APA formatting, thanks to the nice test coverage. I opened a separate issue at https://github.com/citation-file-format/ruby-cff/issues/66.

mfenner commented 3 years ago

I have a pull request that addresses what is discussed in this issue. More cleanup and testing is needed, but the basic functionality of supporting three popular citation styles via citeproc-ruby is working.

@hainesr @arfon if this goes in the right direction, I can polish this in the next few days. Let me know whether this should be an optional dependency or become the new default, e.g. to address #66.

@inukshuk almost everything interesting regarding citeproc-ruby happens at https://github.com/citation-file-format/ruby-cff/pull/67/files#diff-8f8e86f9c0d66b48d62cc552013cba786cff8ff8bca22183a4044cfed316066c

sdruskat commented 3 years ago

I'd be more than happy to help land support for CSL via citeproc-ruby. Currently the implementation is spread across four Gems: citeproc, citeproc-ruby, csl, and namae. The latter is used for name parsing and could be made optional if the names in CFF are already tokenized sufficiently.

CFF supports person names with family-names, given-names, name-particle and name-suffix. Entities just have a name. Guess this would be sufficient for name parsing?

mfenner commented 3 years ago

This sounds good, and there is more work to do on my side in mapping CFF to Citeproc. Currently my mapping in ruby-cffonly supports family-names, given-names and name (for organizations/entities).

inukshuk commented 3 years ago

@sdruskat yes, at this granularity there will be no need for name parsing and making namae optional would have no adverse impact.

@mfenner the 'processor' interface is intended mainly for managing citations (e.g., creating cites in specific orders, tracking stuff like 'ibid' and similar details) and later generating references of all cited works; if I understand this correctly, we're going to be interested only in generating one-off reference strings for a given citation data. In this case it will be best to use the 'renderer' interface directly, similar to how it's done in jekyll-scholar for example.

For performance reasons it will almost certainly be desirable to parse the styles only once, especially if there are only a handful of vetted styles which are going to be used. It also might be useful to reuse the renderer instance, though that should have less of an impact than parsing the CSL styles.

mfenner commented 3 years ago

Thanks @inukshuk. What is the general direction you want to go, optimize the use of citeproc-ruby in ruby-cff, and/or make changes to citeproc-rubycode?

inukshuk commented 3 years ago

I'm happy to make changes to citeproc-ruby in order to make it easier to adopt into ruby-cff. However, the issues I've raised above would have to be addressed either in ruby-cff or even further out, by apps using ruby-cff. As a library, I believe that the best solution for ruby-cff would be something like this:

This way, ruby-cff has minimal dependencies and users have a maximum flexibility. For example, I could install ruby-cff, citeproc-ruby, and csl-styles and then just format references using any of the official CSL styles by name, without worrying too much about fetching or updating individual styles.

Applications like GitHub, using ruby-cff, would likely want to make their own decisions for security and performance reasons. For example, you will probably want to use a limited set of styles and locales. You would not want to parse styles and locales for every reference you generate but just once (both styles and locales should be thread-safe; in any case you'd likely want to cache them somehow instead of parsing XML every time).

We could also attempt to make these decisions in ruby-cff but I don't think a library is generally the right place to do this. I mean, the citeproc formatter could, by default, cache styles and locales and re-use a single renderer instance (or a thread-local one; some features of the renderer are stateful, although I believe it should be possible to make it thread-safe if you're only rendering single references and don't need to keep track of sort order, suppressed successive author names and the like). But I believe an app such as GitHub will want full control, e.g., when exactly to parse a style or locale, or even load marshaled instances instead of parsing them again, and I think a library such as ruby-cff should be able to facilitate that instead of making its own assumptions.