force11 / force11-sciwg

FORCE11 Software Citation Implementation Working Group
https://www.force11.org/group/software-citation-implementation-working-group
BSD 3-Clause "New" or "Revised" License
56 stars 18 forks source link

Experience with Debian Science's Citations project? #45

Open katrinleinweber opened 6 years ago

katrinleinweber commented 6 years ago

Hi!

I learned about https://wiki.debian.org/DebianScience/Citations recently and was wondering whether anyone here knows about that initiative? The "med" & "science" blends were apparently discussing and implementing (not sure how far) a system-wide way for users to gather citation info from packages they were using.

mr-c commented 6 years ago

Hello @katrinleinweber , I'm a member of Debian Med. As you see, we try to include citation information for every academic produced bioinformatic, medicine, or hospital related piece of software we package. Other teams within Debian do the same.

@tillea can provide more up to date information on the status of the effort.

katrinleinweber commented 6 years ago

Cool, thanks for the update! I'm not so much asking for details for myself, though, but whether there was a connection between efforts within Debian and within SCIWG ;-) Since you were not in the contributor list here I thought it's better to ask.

tillea commented 6 years ago

Hi, currently the status of citation handling is the following:

katrinleinweber commented 6 years ago

Thanks for explaining :-) I understand that a Debian Science user would need to:

  1. include that .bib in their own workflow and
  2. start citing the package names.

If that's it: Neat!

Aside from that, I noticed that you use @article{…} and @misc{…} and didn't find a mailing list discussion about using @software{…}. I heard that the latter item type is not yet fully supported in Bib(La)TeX, but doesn't break anything, because it gets treated as @misc{...}. Do you have experience with that? Have you considered encouraging the use of @software{...}?

mr-c commented 6 years ago

@katrinleinweber Our initial focus was on research software with traditional publications.

A quick search of the entire Debian code base shows only two instances of @software in the source packages as citation instructions: octave-interval: https://sources.debian.org/src/octave-interval/3.1.0-5/CITATION/ and octave-splines: https://sources.debian.org/src/octave-splines/1.3.2-3/CITATION/ The remaining hits are two @software non-self citations: https://codesearch.debian.net/search?q=%40software%5B%5B%3Aspace%3A%5D%5D*%7B&perpkg=1

Looks like the two CITATION files were hand transformed to Type: software debian/upstream/metadata files: https://sources.debian.org/src/octave-interval/3.1.0-5/debian/upstream/metadata/ https://sources.debian.org/src/octave-splines/1.3.2-3/debian/upstream/metadata/

However I can't find any octave in http://blends.debian.net/packages-metadata/debian.bib, so perhaps there is an issue on our side. @tillea Any ideas here?

While we could autogenerate @software citations for packages lacking them, Debian's notion of authorship is currently limited to names that appear after Copyright. From what I've seen, that is often very incomplete.

Once codemeta &/or the citation file format catches on we could obviously auto-ingest those.

sdruskat commented 6 years ago

@mr-c Interesting. What would you need for the Citation File Format (and CodeMeta) to be auto-ingestable during your workflow?

mr-c commented 6 years ago

@sdruskat a script (citation_to_debian_metadata ?) that would discover the Citation File Format or CodeMeta file (starting from the current working directory) and transform it into our debian/upstream/metadata format, writing it there. Then the package maintainer would confirm the data and add it to the source package.

Is there any notion of presenting CodeMeta and/or CFF files in a standard location when applications are installed? In CWL we standardized how to discover them on the filesystem: http://www.commonwl.org/v1.0/CommandLineTool.html#Discovering_CWL_documents_on_a_local_filesystem

To discover CWL documents look in the following locations:

/usr/share/commonwl/

/usr/local/share/commonwl/

$XDG_DATA_HOME/commonwl/ (usually $HOME/.local/share/commonwl)

$XDG_DATA_HOME is from the XDG Base Directory Specification.

A similar approach would be useful for CodeMeta and/or CFF (/usr/share/metadata/, /usr/local/share/metadata/, etc.? )

Debian could then take advantage of such a standardized location in several ways: for example we'd also install the citation/metadata files to the appropriate directory even if the tool author forgot to instruct their build system to do so.

rdicosmo commented 6 years ago

Hi Katrin, I think one needs to be careful not to mix different concepts here.

What I see in the bibtex file maintained in Debian ( http://blends.debian.net/packages-metadata/debian.bib) is a great work compiling "scientific articles" that "describe software" that is also packaged in Debian. As such, the entry type @article is quite appropriate most of the time (for @tillea, conference proceedings, like aevol or astroml, should be @inproceedings, not @misc, by the way).

On the other hand, citing "software" itself is really an open issue right now.

On one side, the @software BibTeX entry type has no formal existence, as it is not supported by any of the major bibliographic styles out there; the fact that @software "works" in bibtex is just because bibtex and biblatex use @misc as fallback for all "unknown" entries: you can add a @foobar entry in your .bib file and it will "work" exactly the same :-)

On the other side, determining what should go in a citation for software is a really complex issue: what would be the fields of a @software type? Which ones are mandatory, which are optional? And even for an apparently simple concept like "author", it is really not clear what names should be included: in principle, it should be the software project's own responsibility to come up with that list, but that's a touchy issue, as the list changes over time.

-- Roberto

Roberto Di Cosmo


Computer Science Professor (on leave at INRIA from IRIF/University Paris Diderot)

Software Heritage E-mail : roberto@dicosmo.org INRIA Web : http://www.dicosmo.org Bureau C123 Twitter : http://twitter.com/rdicosmo 2, Rue Simone Iff Tel : +33 1 80 49 44 42 CS 42112 75589 Paris Cedex 12

GPG fingerprint 2931 20CE 3A5A 5390 98EC 8BFC FCCA C3BE 39CB 12D3

2018-02-21 9:04 GMT+01:00 Katrin Leinweber notifications@github.com:

Thanks for explaining :-) I understand that a Debian Science user would need to:

  1. include that .bib in their own workflow and
  2. start citing the package names.

If that's it: Neat!

Aside from that, I noticed that you use @article{…} and @misc{…} and didn't find a mailing list discussion about using @software{…} https://lists.debian.org/cgi-bin/search?P=bib+%2Btype+%2Bsoftware+%2Bmisc&DEFAULTOP=or&B=Gdebian-med&B=Gdebian-science&SORT=&HITSPERPAGE=10&xP=bib%09type%09software%09misc&xFILTERS=Gdebian-med%7EGdebian-science%7E-%7E%7E4294967295. I heard that the latter item type is not yet fully supported in Bib(La)TeX, but doesn't break anything, because it gets treated as @misc{...}. Do you have experience with that? Have you considered encouraging the use of @software{...}?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/force11/force11-sciwg/issues/45#issuecomment-367243451, or mute the thread https://github.com/notifications/unsubscribe-auth/AAp-v4kJpBi1QpKTLW6mTD6thPqq_xyYks5tW84FgaJpZM4SJdtJ .

sdruskat commented 6 years ago

@mr-c AFAIK, at the moment, there is no standard for placing CFF/CM files on a local file system. CFF suggests placing the file in the remote repo's root dir, and so does CodeMeta (right?).

I'll have to think about the standard location concept for CFF. What you describe for Debian packages seems useful, although may not be portable for other types of software.

As for a conversion script, I'll add https://wiki.debian.org/UpstreamMetadata to the list of formats that I'd like CFF (infrastructure) to support :).

katrinleinweber commented 6 years ago

Hello @rdicosmo,

yes, of course entry type @article is quite appropriate. I meant the @misc ones only. Apologies for not writing that down.

Regarding: @software{...}. It doesn't seem like there is any risk of breaking things if people started to use it. Pushing the egg to hatch the chicken, so to speak ;-)

katrinleinweber commented 6 years ago

[…] discover the Citation File Format or CodeMeta file […] and transform it into our debian/upstream/metadata format

Interesting! I understood it the other way round: CFF & CM being generated out of the community's existing metadata files/formats.

[…] install the citation/metadata files to the appropriate directory even if the tool author forgot to instruct their build system to do so.

That would be very convenient for developers! It seems to me that a) pushing the option to generate these files to the developers' local machines will find lower traction than b) to centralise that task. Which OTOH, should be accompanied by offering a suppress_{CFF|codemeta) flag somewhere .

katrinleinweber commented 6 years ago

[…] location concept for CFF. What you describe for Debian packages seems useful, although may not be portable for other types of software.

Can at least the XDG_DATA_HOME standard be applied Linux-wide?

mr-c commented 6 years ago

I'll have to think about the standard location concept for CFF. What you describe for Debian packages seems useful, although may not be portable for other types of software.

CWL's choice of locations for discovery is not Debian specific. It was done by consulting the Filesystem Hierarchy Standard v3.0 and the XDG Base Directory Specification v0.6 neither of which is specific to a flavor/type of Linux/Unix.

sdruskat commented 6 years ago

@mr-c Thanks, by "not portable" I meant other OSs, types of software that aren't installable; these however wouldn't be touched by that particular standard anyway, so makes sense.

mr-c commented 6 years ago

Interesting! I understood it the other way round: CFF & CM being generated out of the community's existing metadata files/formats. I suggest 1) consuming all available metadata sources to produce a standardized format 2) puting that standardized format in as many useful places as possible 3) over time replace non-standardized formats with the standardized format

For example, eventually we may not need debian/upstream/metadata

sdruskat commented 6 years ago

CFF & CM being generated out of the community's existing metadata files/formats.

For example, eventually we may not need debian/upstream/metadata

CFF at least, I've thought of as a source format rather than a target format (apart maybe for conversions from codemeta.json). Ideally either CFF/CM or both or something else can replace every other software metadata format, yes :).

tillea commented 6 years ago

However I can't find any octave in http://blends.debian.net/packages-metadata/debian.bib, so perhaps there is an issue on our side. @tillea Any ideas here?

My first guess is the mentioned move from alioth.debian.org -> salsa.debian.org. Currently you should simply expect missings since I have not yet updated the code that is gathering the data from salsa. Andreas.

mr-c commented 6 years ago

@tillea D'oh, that's right, sorry for the bother. I was curious how the type: software entries get rendered into BibTex