cboettig / knitcitations

:package: Generate citations for knitr markdown and html files
http://carlboettiger.info
Other
220 stars 28 forks source link

Support for CSL-styles #38

Open kovla opened 11 years ago

kovla commented 11 years ago

Hi,

This is not an issue, rather a suggestion. It would be great if it were possible to specify the reference/bibliography style using CSL, e.g. by supplying a .csl file. Probably easiest by integration with one of the external CSL processors.

How does this factor in the future plans for knitcitations and/or technical possibilities?

Thanks, Maxim

cboettig commented 11 years ago

It's a good idea but not a high priority. Like you say, using an external parser would make the most sense. I know there are a few CSL parsers available in ruby, javascript, and haskell, but introducing a dependency on any of these would probably be cumbersome.

Poking around, it looks like I can get the requested CSL format from CrossRef's crosscite API: http://www.crosscite.org/cn/, at least for citation data returned by DOIs (would need a different strategy to handle citations formatted by Greycite or pulled from a bibtex file). I'll look into adding that at some point.

It's not a high priority for me for a few reasons. I think citation formatting for different journals is a bit archaic, and if it matters at all it should be a choice of the reader to render citations in there prefered format, not a choice of the author. If journals want a specific ordering, they should do it themselves (as new journals like elife and peerJ do), rather than wasting the author's time with this. Meanwhile, Pandoc already provides a good tool for adding CSL-styled citations to formal pubs created in Markdown.

Knitcitations is currently aimed at an online environment, where such tools are more lacking. I'm happy to generate a bib file for my publications and use Pandoc markdown, but didn't want to do that for everything I cite in my blog -- much easier to cite a link or doi without extracting citation info into some reference manager first. The other goal was to add value to web-based publishing that doesn't exist in traditional journal pdfs -- like the CITO semantics and the tooltips.

In the latest version, the user can manually control the citation format a bit more -- i.e. what elements to include and what order to put them in, using the "ordering" option. Beyond that, I don't think anyone cares if there blog follows the citation style for SIAM or Ecology Letters or what-not.

I do appreciate that it would be nice not to have to switch to Pandoc for turning content into formal pubs where CSL matters. Unfortunately, the automatically extracted metadata approach via Crossref, Greycite, etc, isn't really robust enough for formal publication anyway (crossref records occasionally miss some information, or use strange capitalization, etc).

Thanks for the suggestion and sorry for the long answer about priorities.

cboettig commented 11 years ago

From crosscite:

Both CrossRef and DataCite support formatted citations via the text/bibliography content type. These are the output of the Citation Style Language processor, citeproc-js. The content type can take two additional parameters to customise its response format. A "style" can be chosen from the list of style names found in the CSL style repository. Many styles are supported, including common styles such as apa and harvard3:

$ curl -LH "Accept: text/x-bibliography; style=apa" http://dx.doi.org/10.1126/science.169.3946.635
Frank, H. S. (1970). The Structure of Ordinary Water: New data and interpretations are yielding 
  new insights into this fascinating substance. Science, 169(3946), 635-641. American Association 
  for the Advancement of Science AAAS (Science). doi:10.1126/science.169.3946.635

A locale can also be specified. Use one of the locale names from the CSL locales repository:

    $ curl -LH "Accept: text/x-bibliography; style=harvard3; locale=fr-FR"     http://dx.doi.org/10.1126/science.169.3946.635
Frank, HS 1970, « The Structure of Ordinary Water: New data and interpretations are yielding new 
  insights into this fascinating substance ». Science, vol. 169, no. 3946, p. 635-641. Consulté 
  de http://dx.doi.org/10.1126/science.169.3946.635
kovla commented 11 years ago

Thank you for the prompt response. I entirely understand the focus of knitrcitations being on web applications. My own application, however, involves heavy duty academic writing, and believe you me, citation format is not a minor issue if you want to get published in a high-ranking journal. On the other hand, I love the idea of reproducible research, and would like to follow that paradigm in practice. The absence of a proper (read: convenient) citation mechanism is the biggest barrier at this point. Sure, pandoc can be a solution, but there are some usability issues. I (as many other people) do not use Bibtex as main library format, and converting the library to Bibtex each time you add a source plus remembering the keys for your 50-100 citations per article is pain in the arse, frankly speaking.

Instead, I use Zotero (a very popular and functional open-source citation manager), and my original question is related to the idea to write an interface between R and Zotero. Something along the lines of what knitcitations does, but pulling references directly from Zotero instead of an online repository. The most appealing way is to handle citations from within R entirely, producing the formatted references prior to any eventual conversion with pandoc. CSL has to be implemented though for this to be of practical value (at least to those who don't write for the web). If this is solved, the rest is pretty much straightforward.

johnstantongeddes commented 11 years ago

To chime in...I'd argue that anyone using knitrcitations is likely doing heavy duty academic writing at some point. Carl's point is that simply some tools are better at some tasks. knitrcitations is developed for web documents or tech reports. Markdown, latex or doc or better tools for writing papers.

If you're using Zotero, try AutoZotBib which makes maintaing a bibtex file automatic (it does slow zotero down though...but it works). Then you can take advantage of all the great *.bib citation software that already exists. Jabref for instance works quite well.

cboettig commented 11 years ago

Thanks for sharing these perspectives. It's interesting to hear that managing bibtex keys and generating the bibtex file from Zotero is sub-optimal. My intuition would have been to export bibtex from zotero and just use pandoc.

Taking a closer look at the Zotero API, we may have all the tools we need to do much better. I'm not a Zotero user so there's a few steps I don't understand.

We can get the bibliographic data for an item or collection from the Zotero API http://www.zotero.org/support/dev/server_api/v2/read_requests. Cool.

All items in a collection (note that we need the user ID # and collection ID #. As this is a public collection, no authentication is necessary, but could be added in the case of private collections): https://api.zotero.org/users/475425/collections/9KH9TNSJ/items

Even better, we can ask for the bibliographic formatting of an object in a known CSL format by adding "style=" parameter, e.g.

https://api.zotero.org/users/475425/items/X42A7DEE?format=bib&style=apa

This raises a few issues. How do we choose to cite the item? I assume citet("zotero:X42A7DEE") would not be a popular choice (though obviously the easiest to implement from a technical standpoint). Or is it easy for you to look up the zotero key for an object you want to cite? (sounds worse than dealing with bibtex keys).

I note that we can get a list of all citations in a user's specific collection https://api.zotero.org/users/475425/collections/9KH9TNSJ/items (or overall in a user's library: https://api.zotero.org/users/475425/itemshttps://api.zotero.org/users/475425/items?format=keys). If we downloaded all that data we could then attempt to match doi (not every item might have a doi), or title or something against that database, but that would be pretty taxing to repeat for each citation. I'm open to suggestions on this front.

On Tue, Apr 16, 2013 at 2:19 PM, Maxim K notifications@github.com wrote:

Thank you for the prompt response. I entirely understand the focus of knitrcitations being on web applications. My own application, however, involves heavy duty academic writing, and believe you me, citation format is not a minor issue if you want to get published in a high-ranking journal. On the other hand, I love the idea of reproducible research, and would like to follow that paradigm in practice. The absence of a proper (read: convenient) citation mechanism is the biggest barrier at this point. Sure, pandoc can be a solution, but there are some usability issues. I (as many other people) do not use Bibtex as main library format, and converting the library to Bibtex each time you add a source plus remembering the keys for your 50-100 citations per article is pain in the arse, frankly speaking.

Instead, I use Zotero (a very popular and functional open-source citation manager), and my original question is related to the idea to write an interface between R and Zotero. Something along the lines of what knitcitations does, but pulling references directly from Zotero instead of an online repository. The most appealing way is to handle citations from within R entirely, producing the formatted references prior to any eventual conversion with pandoc. CSL has to be implemented though for this to be of practical value (at least to those who don't write for the web). If this is solved, the rest is pretty much straightforward.

— Reply to this email directly or view it on GitHubhttps://github.com/cboettig/knitcitations/issues/38#issuecomment-16472481 .

Carl Boettiger UC Santa Cruz http://www.carlboettiger.info/

kovla commented 11 years ago

Suboptimal means that (1) you have to constantly update the exported Bibtex. AutoZotBib does that indeed, but it is somewhat slow and the issue there is that (2) you still have to obtain the bibtex key somehow and remember it. That is a lot of fuss if you are actively updating your library in the process of writing, which is often the case. It is not a technical issue, rather a usability problem, having to combine too many bits and pieces to achieve the workflow. Non-automated complexity means errors. My idea was to write an R function that would extract the proper citation from Zotero (in any form, be it Zotero key for another internal function or Bibtex key for pandoc) based on a search string, e.g. author and year. That search could be fuzzy, using agrep for example. The function would issue a warning if multiple sources match the search string.

Instead of API I would use the Zotero database directly. It is an sqlite database and it can be accessed from R without any credentials other than the path to it. In terms of search parameters this approach is much more flexible than the online API. Add a CSL processor to that (there is one in Python by the way), and you can generate citations entirely within R.

cboettig commented 11 years ago

I'm guessing that Regex-ing against the Zotero's SQLite database doesn't address the CSL formatting issue at all. Yes, I realize that python, ruby, haskell, and javascript have CSL processors, but I don't intend to implement one in R from scratch nor to introduce a dependency on one of these packages in knitcitations. Querying a local database doesn't have the cut-and-paste reproducibility that we get by accessing a public Zotero collection through the API, or querying the crossref API.

Your approach is an interesting one, but it sounds out of scope for knitcitations, for the two reasons I mention above, and given the web focus I mentioned earlier. I'll leave this issue open to to remind me to think about a generic CSL solution.

kovla commented 11 years ago

Carl, thanks for the discussion. Both are valid reasons of course.

rpietro commented 11 years ago

Carl, I noticed that you have a bibstyle option under the bibliography function that is set to JSS as a default. Is that option related to this discussion?

cboettig commented 11 years ago

Sadly nope. That's based on base R's citation function, and that's the only style it implements. See ?bibstyle. No mapping between that and CSL. A solution will probably rely on CrossRef API's CSL instead...


Carl Boettiger http://carlboettiger.info

sent from mobile device; my apologies for any terseness or typos On Jul 28, 2013 7:45 PM, "Ricardo Pietrobon" notifications@github.com wrote:

Carl, I noticed that you have a bibstyle option under the bibliography function that is set to JSS as a default. Is that option related to this discussion?

— Reply to this email directly or view it on GitHubhttps://github.com/cboettig/knitcitations/issues/38#issuecomment-21696954 .

cboettig commented 10 years ago

@kovla @rpietro Curious if you would be happy having this addressed by having knitications simply create inline citation keys in pandoc's format as an option, (see https://github.com/cboettig/knitcitations/issues/57 )

Pandoc could then compile the markdown using the bib file generated and citations would be styled appropriately using the given CSL. Since it's looking like pandoc will be shipped & integrated in with newer versions of Rstudio, I suspect using pandoc in place of other markdown parsers may be less of an issue?

rpietro commented 10 years ago

Carl, yes, that would work. converting from pandoc with an external bib file is my current approach when i need to generate citations (meaning almost always). in our day and age, traditional (non-semantic) citations are plain stupid, but we're literally stuck with them until organizations change, and that will unfortunately happen slowly

would love to test it once you have something and even help create some tutorials (i've done a few internal tutorials for the pandoc bibtex strategy)

On Tue, May 27, 2014 at 1:22 PM, Carl Boettiger notifications@github.comwrote:

@kovla https://github.com/kovla @rpietro https://github.com/rpietroCurious if you would be happy having this addressed by having knitications simply create inline citation keys in pandoc's format as an option, (see

57 https://github.com/cboettig/knitcitations/issues/57 )

Pandoc could then compile the markdown using the bib file generated and citations would be styled appropriately using the given CSL. Since it's looking like pandoc will be shipped & integrated in with newer versions of Rstudio, I suspect using pandoc in place of other markdown parsers may be less of an issue?

— Reply to this email directly or view it on GitHubhttps://github.com/cboettig/knitcitations/issues/38#issuecomment-44306721 .

cboettig commented 10 years ago

@rpietro Yeah, I'm still figuring out what the best way to handle semantic citations would be. My intuition at present is that this is partly a matter of styling an inline citation, which is something CSL already does.

For HTML publishing at least, I'm thinking about writing a custom CSL file that defines an inline citation to be a link, potentially with a title attribute (for pop-up citation) and potentially with the semantic information as well. Just have to see how I can get the semantic information in.

I imagine a user starting with the same command, like citep("10.1186/2041-1480-1-S1-S6", cito = "usesMethodIn") , which would generate the text output [@Shotton2010], while also writing the cito out to a custom field in the bibtex (or possibly yaml, since pandoc now supports that). This looks do-able (see csl primer). Do you think it's sensible?

rpietro commented 10 years ago

sounds very reasonable to me, would be good to test it in practice with the roughest possible version to see how it goes. again, would love to help test it, as i use markdown with citations multiple times/day, for things like slides in slidify (which is the only thing i'm using these days), articles and grant proposals

On Tue, May 27, 2014 at 1:38 PM, Carl Boettiger notifications@github.comwrote:

@rpietro https://github.com/rpietro Yeah, I'm still figuring out what the best way to handle semantic citations would be. My intuition at present is that this is partly a matter of styling an inline citation, which is something CSL already does.

For HTML publishing at least, I'm thinking about writing a custom CSL file that defines an inline citation to be a link, potentially with a title attribute (for pop-up citation) and potentially with the semantic information as well. Just have to see how I can get the semantic information in.

I imagine a user starting with the same command, like citep("10.1186/2041-1480-1-S1-S6", cito = "usesMethodIn") , which would generate the text output [@Shotton2010], while also writing the cito out to a custom field in the bibtex (or possibly yaml, since pandoc now supports that). This looks do-able (see csl primerhttps://github.com/citation-style-language/documentation/blob/master/primer.txt). Do you think it's sensible?

— Reply to this email directly or view it on GitHubhttps://github.com/cboettig/knitcitations/issues/38#issuecomment-44308717 .