force11 / force11-sciwg

FORCE11 Software Citation Implementation Working Group
https://www.force11.org/group/software-citation-implementation-working-group
BSD 3-Clause "New" or "Revised" License
56 stars 18 forks source link

Minimal working example for largely _automated_ import, use & render of software citation? #48

Closed katrinleinweber closed 5 years ago

katrinleinweber commented 6 years ago

Hello!

I read through https://research-software.org/citation/researchers/ and was left wondering whether there are already technical solutions to the three workflow steps of importing, using-in-a-manuscript and rendering a software citation? A beginner-friendly MVP so to speak, that does not rely on formatting a reference list manually, or fiddling with BibTeX item types or writing one's own CSL?

Importing works already well, if a BibTeX snippet is offered or a Zotero-translator. Example: R-Packages.js on CRAN.

Using is also fine, since many text editors and word processors can integrate well with most reference managers. Thus, inserting a citation into the doc works also fine.

However, when one actually wants to render the document, which BibTeX or CSL styles and processors are available that treat software in a minimally useful way? Such as rendering the version number from it's own field instead of a v1.2.3 appendix in the title or description?

sdruskat commented 6 years ago

whether there are already technical solutions to the three workflow steps of importing, using-in-a-manuscript and rendering a software citation?

From the CFF perspective, not yet. There's ongoing work around CITATION.cff file readers in different languages (Java, Ruby, Python) which will be helpful to interface with other workflow components (generators, converters, etc.).

Social gaps here include getting vendors to use the tools we create.

Technical gaps include that BibTeX is broken for software citations, which can be circumvented, e.g., by reading the version field from metadata into the title field as Title (Version {version}). Also on the publishers' side, the relevant metadata formats (JACS I think?) need to be adapted to accommodate the relevant metadata.

Working towards that MVE one step at a time.

katrinleinweber commented 6 years ago

One step forward on the import side: https://github.com/zotero/translators/pull/1578 :-)

katrinleinweber commented 6 years ago

Technical gaps include that BibTeX is broken for software citations [...]

True (as explained), but what is preventing this "unsupported types" from becoming supported by Bib(La)Tex/biber? There seem to be no feature requests there. However, thanks to doi:10.5446/35357 we seem to be close to putting together the necessary files.

Dear @moewew, do you know whether there is a reason why @software is not yet supported in Bib(La)TeX/biber? Not simply due to a lack of PRs, I hope ;-)

moewew commented 6 years ago

@software is among the known entry types, but it ~is not supported by~ has no dedicated driver set up in the standard styles at the moment. That means that @software falls back to using @misc's driver, which seems good enough for me (it even supports a version field, see https://tex.stackexchange.com/q/254610/). There are some custom styles that implement support for a dedicated @software driver, though. The wonderful https://github.com/alex-ball/biblatex-oxref comes to mind, the no less wonderful https://github.com/plk/biblatex-apa also supports @software.

I can't tell you why exactly this particular entry type is not amongst the ~supported types~ predefined drivers. But I assume it was not thought of as important enough when the package was written by its original author and that the soft alias for @misc seemed to work fine. The current stance toward support for unsupported entry types seems to be that there needs to be an interesting challenge to meet to warrant inclusion into the core. See this response to a feature request for @standard https://github.com/plk/biblatex/issues/388.

I tend to be (too) conservative when it comes to these kind of requests since I worry about feature creep.

If you are not satisfied with the status quo that the soft mapping to @misc gives you in the standard styles, please head over to https://github.com/plk/biblatex/issues to discuss this. There is quite a lot to ~supporting a new entry type~ writing a type-specific driver and quite some things would have to be discussed and decided upon if the entry type is deemed worthy of ~inclusion~ having its own driver.

edit: Clarified language. @software is and always has been supported in the standard data model. There is just no dedicated bibliography driver for @software, the standard styles fall back to using @misc's driver. For the end user that difference does not matter at all (except when they want to modify a driver, and the only difference there is that they need to know that @standard uses the code from @misc).

danielskatz commented 6 years ago

I think this would be great to do, and fits this group's goals quite well, if someone wants to take it on. It's been a potential action for a long time.

katrinleinweber commented 6 years ago

Ah-ha, thank you! If several PRs with support for @software were to appear against important styles, would that be a way to increase chances to get the type itself supported officially?

The "encountered often" argument looks well covered by research software engineers. I'm not sure about the and require changes to internals, though.

If support can be increased with "existing biber-only features", I see the question of how to get such patches onto peoples machines and into production workflows arise. Which I hope will be less relevant after official support.

moewew commented 6 years ago

If several PRs with support for @software were to appear against important styles, would that be a way to increase chances to get the type itself supported officially?

Just so I understand you correctly: What would "official support" mean for you? @software is already a recognised entry type in biblatex's standard model and it is even acknowledged in the documentation. Its being an ~"unsupported type"~ non-standard type only means that there is no specific driver set up in standard.bbx and that it uses the one defined for @misc. Even at the moment @software is a perfectly valid entry type that gives (I think) fairly satisfying output even with the standard styles.

So maybe we have a problem terminology here. @sofware is officially supported in that it is a recognised and known entry type. It's just that biblatex has nothing special set up for it and treats it as @misc, that is what "unsupported" meant in the Unsupported Types section heading.

@software is as valid an entry type as @book. A properly unsupported entry type would be @flobbel.

edit Following discussions in https://github.com/plk/biblatex/issues/753 the biblatex documentation now list what was formerly and confusingly known as 'unsupported' entry types as 'non-standard' in the section title, which should be taken to mean that the standard styles have no dedicated driver set up for these types, but that the types are perfectly fine and supported. The accompanying text will hopefully make the status of these entry types clear.

edit 2 Clarified the 'unsupported' even more and changed to 'non-standard'.

edit 3 Please note that this comment only applies to the biblatex standard styles. It does not apply to BibTeX's .bst styles (which are still more widely used than biblatex, I think, and certainly more prevalent in journal publishing - almost no publisher uses biblatex). For the bigger picture, please see my comment below https://github.com/force11/force11-sciwg/issues/48#issuecomment-436569563.

katrinleinweber commented 6 years ago

Thanks for explaining :-) My apologies! Yes, I meant a specific driver, so that a useful set of fields (and their formatting also, I presume) can be defined to implement (at least an MVP of) the Software Citation Principles.

moewew commented 6 years ago

Maybe we should head over to the biblatex bug tracker with a proper list of what you want, what you can have so far and what would need to be done on the biblatex side.

At the moment I'm sceptical that a lot would have to be done. Even without a dedicated driver @software you can use type-specific formatting (since it is a valid entry type). And I think even without any modifications almost all columns of https://doi.org/10.7717/peerj.2394/table-2 are covered.

@misc also supports a type field that could be filled with the keyword software. That keyword is automatically translated by biblatex. Additionally, there are also titleaddon, howpublished, pubstate and organization that you could use.

edit Take

\documentclass[british]{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{babel}
\usepackage{csquotes}

\usepackage[style=authoryear, backend=biber]{biblatex}

% you could also use 'titleaddon' instead of 'type', but 'type' feels more semantic
\usepackage{filecontents}
\begin{filecontents}{\jobname.bib}
% modified from https://github.com/plk/biblatex-apa/blob/0636af27d53cc21f9135e2ae456949e38c65bb95/bibtex/bib/biblatex-apa-test-references.bib
@software{bioshock,
  author         = {{2K Games}},
  type           = {Computer Game},
  title          = {{BioShock}},
  date           = {2007}
}
@software{7.08:56,
  title          = {Comprehensive {Meta-Analysis}},
  type           = {Computer software},
  version        = {2},
  location       = {Englewood, NJ},
  organization   = {Biostat}
}
% https://tex.stackexchange.com/a/374194/35864
@software{hadoop,
  author  = {{Apache Software Foundation}},
  title   = {Hadoop},
  url     = {https://hadoop.apache.org},
  version = {0.20.2},
  date    = {2010-02-19},
}
% ----
@software{bootstrap,
  author     = {Rob Tibshirani and Friedrich Leisch},
  title      = {bootstrap},
  type       = {R package},
  url        = {https://cran.r-project.org/package=bootstrap},
  version    = {2017.2},
  date       = {2017-02-27},
}
\end{filecontents}

\addbibresource{\jobname.bib}

\nocite{*}
\begin{document}
\printbibliography
\end{document}

This already gives rsfosfwtare

To me that seems OK for a 'normal' bibliography. Naturally the output could be more verbose for other purposes.

We have to keep in mind that many biblatex package implement the requirements of certain style guides. As long as these style guides don't mention software-type citations it is a bit of grey area what should happen, keeping close to what @misc gives is the safe option in that case. Another concern is that software citations should blend in with other types and therefore should probably not afford an extravagant implementation.

danielskatz commented 6 years ago

Just to add another topic to this discussion, I think an additional step would be to work with groups that create the documents that are used by authors to create their publications, including class files and templates.

katrinleinweber commented 6 years ago

Just to clarify:

@misc also supports a type field that could be filled with the keyword software.

primarily means type = {software}, correct? This is used often, but I also saw the notation keywords={type:...} for other... Well, types.

moewew commented 6 years ago
  type = {software},

is indeed what I had in mind. This will already generate sensible output with a localised term for software in the standard (and I assume many contributed) styles.

The keywords field is mainly intended for filtering and not for output, no serious biblatex style ("serious" to exclude debug and the like) that I know actually prints that field. Of course there is nothing stopping you from using keywords = {type:software}, but the one that gives you a sensible output is type = {software} (needless to say that filtering can also happen on the contents of the type field with bibchecks, so keywords = {type:software} would be redundant, but keywords allows for a convenient shortcut avoiding having to write a bibcheck).

katrinleinweber commented 5 years ago

biblatex 3.12 has … @software and friends … absolutely fine, the standard styles just don't have a dedicated driver set up for these types … https://github.com/plk/biblatex/issues/753#issuecomment-436159332

sdruskat commented 5 years ago

I think a good example of a largely automated workflow of importing, using and rendering software citation is the Netherlands eScience Center's Research Software Directory (https://research-software.nl/, e.g. https://research-software.nl/software/xenon):

(From Spaaks, Jurriaan H.; Maassen, Jason; Klaver, Tom; Verhoeven, Stefan (2018): How the Netherlands eScience Center uses CFF to promote software citation. figshare. Presentation at CFF Hack Day. https://doi.org/10.6084/m9.figshare.7053608.v1)

  1. code is on GitHub
  2. GitHub-Zenodo integration to mint DOIs whenever a release is made.
  3. Research Software Directory has an Admin interface, provides concept DOI
  4. scraper asks Zenodo for the associated versioned DOIs
  5. scraper then visits the links for each,
    • looks for a CITATION.cff
    • checks if it's valid
    • generates BibTeX, EndNote, RIS
    • generates CodeMeta that is embedded in the landing page for a software
katrinleinweber commented 5 years ago

I think we can do better then 5+X steps.

What if @Zenodo started providing BibTeX with @software? I think it's time to revisit https://github.com/zenodo/zenodo/issues/1428 and convince the important downstream tools who hiccup on @software to behave more gracefully.

Also, is there any BibTeX style that already support @software? If yes, we'd be at:

  1. code is on GitHub
  2. GitHub-Zenodo integration to mint DOIs whenever a release is made.
  1. copy BibTeX from Zenodo to a project- or user-specific .bib

Implicitly, both my & probably @sdruskat's list continue with:

moewew commented 5 years ago

I'm not sure if it is sufficiently clear, so apologies if I state the obvious.

The world of .bib files is extremely fragmented. There is not only the obvious dichotomy BibTeX vs biblatex that most people and tools are aware of in one way or another. All BibTeX styles are allowed to have potentially different data models and you would have to change all .bst styles available today that don't support @software to reach 100% coverage.

For anything other than a core set of fields listed in §3 of BibTeXing by Oren Patashnik ("btxdoc"), that was finalised in the late eighties, consensus among .bst files (BibTeX styles) can not be guaranteed. The base styles have remained largely unchanged since 1988 and don't even support popular things like a url field or DOIs or even an @online or @electronic entry type. Other styles have bridged that gap and added these fields (and others) to their repertoire. But not all styles have done that in the same way, so that inconsistencies and incompatibilities between styles have to be expected. BibTeX files are completely self-contained with regards to which fields and types they support, so every .bst file would have to be changed separately.

Most BibTeX styles will not know a @software type. If @software is used regardless, BibTeX will throw an error and a fallback definition is used that will probably still give acceptable output (compared to what the style can provide overall). The overall chances of affecting large scale changes to many .bst files are slim, many styles have been stable for years and could be seen as abandoned. In some cases it might be possible for other people to take over maintenance of styles (the LPPL allows this), but I'm not sure how well this would be received. As far as I am aware there was no large-scale surge to action to add URL/DOI support to all sorts of established styles, and that is something about which people would probably be much more excited than @software. People found ways around it like howpublished = {\url{http://example.com} access date 07/11/2018}, urlbst or different styles - and that was that.

biblatex entered the scene much later and could profit from the experience with BibTeX to start from a more mature and rich data model. It is an express feature of biblatex that styles and users are allowed to change the data model (i.e. to add or remove field or entry types), but the assumption has been that most well-written styles only extend the data model and do not throw out fields and types. Overall the data model is fairly homogeneous because many things were included in the standard data model that were found lacking in BibTeX, but if we are talking about even more specialised stuff like legal citations, you will find that there is again fragmentation among the styles that provide such features.

Since @software has been a valid entry type for biblatex from as far back as I could trace it (v0.8e lists it, the wording is a bit confusing, but it should have been treated like it is treated now), I would venture a guess that most biblatex styles support @software at least in so far that something useful comes out and there is no warning (any more in biblatex 3.13, previously there was a warning, but it was benign - the output is unchanged).


When you are now trying to convince data providers or exporters to use @software instead of @misc, you are fighting a similar battle that people would fight to have those exporters export

url     = {http://example.com},
urldate = {2018-11-07},

instead of

howpublished = {\url{http://example.com}, 07/11/2018},

The latter is quite universal, but the former is clearly preferable for styles that support it. Many exporters go for the lowest common denominator because that causes the fewest complaints about incorrect or missing data. But with that approach many tools that would be able to offer more can't be used to their full potential.

I for one have resigned myself to the fact that data providers often produce .bib files that are far from perfect and far from best practice for the tools I use. See also Software-generated bibliographic entries: common errors and other mistakes to check before use on TeX.SX. The @misc/@software quibble is just another issue on that list.


edit as mentioned in https://github.com/plk/biblatex/issues/753, biblatex 3.13 has 'promoted' @software to the regular entry types section of the manual. The output is still the same as before, but a spurious warning is now suppressed. This does not mean that BibTeX styles now support @software any better than they previously did. This is purely about the biblatex standard data model.

katrinleinweber commented 5 years ago

Thanks to @dbouquin et al. we have a template with explanation now :-)

katrinleinweber commented 4 years ago

Zenodo now generates BibTeX with @software where appropriate. cc @moewew