Open yarikoptic opened 4 years ago
We just need to strip away metadata_extractors
(at least for now while there is no any kind of integration with datalad), git-annex
sections and possibly "adopt" (copy) datalad/support/external_versions.py
... I also wonder if there is some sane way to make this whole wtf
a reusable component tunable for any given project... may be an independent python module? WDYT @jwodder ?
oh, crazy but now making so much sense in hinge sight came to my mind -- we should (ab)use https://github.com/duecredit/duecredit/ !!! We just need to add duecredit support to all related projects -- that would kill ~two~ three birds at once -- citations, dependencies tracking as pertinent to the specific invocation, and their versions
ATM we only "inject" versioning for numpy but it already works
$> DANDI_CACHE=ignore python -m duecredit `which dandi` ls /tmp/bad.nwb /tmp/HardwareTests-V2-IP8.nwb
PATH SIZE SESSION_START_TIME IDENTIFIER SESSION_DESCRIPTION ND_TYPES NWB
/tmp/bad.nwb 32.0 MB 2019-11-08/18:46:09 2ae7afd1a09f78c3d7c3311d71990095010fab706d91f9048986eef429991a70 PLACEHOLDER CurrentClampSeries (73), CurrentClampStimulusSeries (73), Device (148), IntracellularElectrode (147), LabN... 2.2.4
/tmp/HardwareTests-V2-IP8.nwb 9.2 MB 2020-11-21/20:42:02 ac24acc942a5b87538bf15d140e06b4576481565b77b114877c4d26ba23fc09e PLACEHOLDER Device (7), IntracellularElectrode (6), LabNotebook, LabNotebookDevice, StimulusSets, Subject, SweepTable,... 2.2.4
Summary: 41.2 MB 2019-11-08/18:46:09>
2020-11-21/20:42:02<
DueCredit Report:
- Scientific tools library / numpy (v 1.19.4) [1]
1 package cited
0 modules cited
0 functions cited
References
----------
[1] Van Der Walt, S., Colbert, S.C. & Varoquaux, G., 2011. The NumPy array: a structure for efficient numerical computation. Computing in Science & Engineering, 13(2), pp.22–30.
we can add duecredit. there are two things that come to mind:
i think what would be useful for neuroscientists is dataset citation. this crowd would be less interested in citing software, although we should list that as well.
the issue i have with duecredit for software with citing papers is that it misses a lot of contributors. the above example is a perfect one. that paper does not reflect numpy contributors or even the originator.
there is no good answer, but before investing too much time, we may want to be clear about the kinds of sections of citations that would be generated.
re datasets: yes, ultimately we should aim for that. For DataLad datasets with some older aggregated metadata we already do that BTW, see https://github.com/datalad/datalad/pull/3184
re misses: in the context of this issue, of primary interest is version information on all involved dependencies. As for "due credit" of all contributors -- someone smart could e.g. extend duecredit to provide a mode where it would list all contributors associated with github repository or smth like that. But it would not be "citeable" really. The best is to just use zenodo records per each (used) version (would also be a nice feature to add to duecredit, so it could automagically choose correct DOI according to the version). Eh -- we even had it "planned": https://github.com/duecredit/duecredit/issues/117
re datasets: it would be up for us actually to just add due.cite(Doi('...'))
upon operation on some dandiset ;)
edit: meanwhile could be some free text based on description etc with url to dandiset if known, again with a simple due.cite(Text())or
due.cite(Url())` if just boring url. I will submit a PR for generic duecredit addition now and we could extend on that later
similar to
datalad wtf
but with details pertinent to dandi. Here is datalad exampleDataLad 0.12.2 WTF (configuration, datalad, dependencies, environment, extensions, git-annex, location, metadata_extractors, python, system)
# WTF ## configuration