danielskatz / software-vs-data

understanding and documenting the differences between software and data in the context of citation
Creative Commons Attribution 4.0 International
32 stars 10 forks source link

preprint #24

Closed danielskatz closed 7 years ago

danielskatz commented 7 years ago

I'm going to publish the current state of the repo as a PeerJ CS preprint, as you can see in the preprint directory.

If you (@kyleniemeyer @arfon @band @cboettig @khinsen @rwwh @mhucka @alee @knarrff @tompollard @zoidy) are willing to be associated with this document in that format as well, please add your contact info to the LaTeX doc, following my example. (either directly or via a PR).

If anyone wants to start on the abstract, that would be great too.

danielskatz commented 7 years ago

There's also a place for Acks in the tex file. If you want to add anything there, please do.

danielskatz commented 7 years ago

If the other contributors (@arfon @khinsen @rwwh @mhucka @tompollard @zoidy) can either add their info to indicate that they are ok with this being published as a preprint, or say that they don't want to be listed, I would appreciate it.

zoidy commented 7 years ago

Whoops, I made the change but had forgotten to make the pull request

arfon commented 7 years ago

If the other contributors (@arfon @khinsen @rwwh @mhucka @tompollard @zoidy) can either add their info to indicate that they are ok with this being published as a preprint, or say that they don't want to be listed, I would appreciate it.

I don't think I should be listed. My only contribution was a typo fix :-)

danielskatz commented 7 years ago

I don't think I should be listed. My only contribution was a typo fix :-)

@arfon, I think you contributed back when this was part of the software citation paper, before the reviewers of that paper suggested we pull it out. If you disagree with this too, let me know and I will remove you.

arfon commented 7 years ago

@arfon, I think you contributed back when this was part of the software citation paper, before the reviewers of that paper suggested we pull it out.

OK, understood. I've added my details here: https://github.com/danielskatz/software-vs-data/pull/31

khinsen commented 7 years ago

Mine is #32.

tompollard commented 7 years ago

Thanks @danielskatz, mine was #28

danielskatz commented 7 years ago

I've added a draft abstract - comments and edits are welcome.

Also, @knarrff and @mhucka, please add your contact/affiliation, or tell me you don't want to be listed.

danielskatz commented 7 years ago

thanks to all authors for their info

Please add any changes/comments in the next 24 hours so I can submit this Friday afternoon (CST)

rwwh commented 7 years ago

Daniel,

The abstract is brief and to the point.

I'd like to raise the question whether the lack of software citation can be so broadly stated. I come from the crystallography science field originally, and there it is really common to cite software used. The biggest problem is that a software package does not have a consistent way of being cited, so sometimes authors of software packages write an application note to one of the journals in the field, and ask users to cite that paper instead. Some of those kinds of papers rack up >10k citations in a year..... A great example of the result of this practice can be seen in the citations for one of the gurus of crystallographic software: https://scholar.google.nl/scholar?hl=en&q=author%3Ag.m.sheldrick

danielskatz commented 7 years ago

The fact that your field cites papers about software is great, but the fact that they don't actually cite the software itself is problematic to me. Why should a software developer have to write a paper in order to get credit for the software work that they have done, when others are using their software?

Also, I think this discussion would fit better https://github.com/force11/force11-scwg than this repo, which is about software vs data in the context of citation, not about the value of citation.

rwwh commented 7 years ago

OK. Should the sentence "However, software and data are similar in that they both traditionally have not been cited in publications" be in the abstract at all?

Referring to the citation list of George Sheldrick, please note that the first two are indeed papers to describe the software, but the rest of his citation-top-10 are actual software citations, one even from as early as 1986.

danielskatz commented 7 years ago

Good question - I could see removing it, but I also feel like the abstract needs to bring in the idea of software citation before the last sentence.

mhucka commented 7 years ago

Somewhat meta question: would it be appropriate to cite other works that ask similar questions, to make clear we know other people have grappled with this topic? For instance, there is the "NSF Workshop on Supporting Scientific Discovery through Norms and Practices for Software and Data Citation and Attribution" (http://dl.acm.org/citation.cfm?id=2795624), which brought up the same question of data vs software, though doesn't seem to have come up with answers per se.

danielskatz commented 7 years ago

That seems like a good idea, but we should do it in both README.md and the preprint. Feel free to put in a PR for this. :)

arfon commented 7 years ago

Abstract and acknowledgements look good to me 👍

danielskatz commented 7 years ago

I've now started submitting the preprint, but I need an email address for @tompollard - having it in the document itself would be nice but is optional, but I do seem to need it to add you as an author.

tompollard commented 7 years ago

Thanks @danielskatz, the abstract looks good, and I have added my email in a pull request (#38)

danielskatz commented 7 years ago

this has now been published as https://doi.org/10.7287/peerj.preprints.2630v1

bboscoe commented 7 years ago

Hi all, Good work on the submission. Just got to this. A few things:

  1. Glad to find better refs than wikipedia references (especially in light of the large funding blasts on each page).
  2. There's no mention of definitions of software vs. code. Seems like a distinction worth noting- I can write a bit on that too.
  3. Are patents important to mention alongside the copyright issues?
  4. From this paper, and I agree, software is more fragile than data-- perhaps a bit more elaboration on this due to its fluid-like state? On that-- thoughts on distinctions between scripts and compiled code (wrt citations of course)
danielskatz commented 7 years ago

@brandles Since this issue was just for the production of the preprint (and I closed it after the preprint was published), can you open a new issue with your suggestions - they apply to the document itself, not just the preprint.