carpentries-incubator / snakemake-novice-bioinformatics

Introduction to Snakemake for Bioinformatics
https://carpentries-incubator.github.io/snakemake-novice-bioinformatics
Other
18 stars 9 forks source link

Improving the setup instructions - data #51

Closed tbooth closed 1 year ago

tbooth commented 1 year ago

from @tkphd

Specify where to download (home directory? Desktop?) and how to extract this file. Tarballs are unfamiliar to most Windows users. The linked file is also an xzip-compressed Tar archive, which may require extra packages on some Linux distributions. The provided wget command results in "403: Forbidden" on a current Debian system. With the updated URL, this worked: wget https://figshare.com/ndownloader/files/35058796 -O data.tar.xz The contents of this file are nested two directories deep: the top-level "data" folder is extraneous. Package this slice of a dataset with a README explaining its provenance and intended usage, with citations and attribution to the original authors. (Aspire to FAIR principles.) It is unclear whether CC BY-SA applies to a pure dataset, which is not typically eligible for copyright protection: this is not a creative work. Was the source dataset released under a license agreement?

tbooth commented 1 year ago

Specify where to download (home directory? Desktop?) and how to extract this file. Tarballs are unfamiliar to most Windows users.

I've provided the command to unpack the file. The location doesn't matter so I don't think there's a need to specify it. I'd assumed tar files might be covered in the shell novice lesson, but they are not, so I've provided a relevant link to the GNU tutorial for the curious.

tbooth commented 1 year ago

The linked file is also an xzip-compressed Tar archive, which may require extra packages on some Linux distributions.

I'm not aware of any, unless someone tries this on something truly antique. I believe this is also supported by the tar program on Mac systems, but I'm not sure when this came in. Possibly users of older Macs will need to install some Homebrew packages, or get tar/xz via Conda. I'd assume Windows users will have WSL which provides a pretty up-to-date Ubuntu toolset.

tbooth commented 1 year ago

The provided wget command results in "403: Forbidden" on a current Debian system. With the updated URL, this worked: wget https://figshare.com/ndownloader/files/35058796 -O data.tar.xz

I can't reproduce this. I have Ubuntu, not Debian, but I tried:

$ docker run -it debian /bin/bash
...
# cd
# apt update ; apt install wget
...
# wget --content-disposition https://ndownloader.figshare.com/files/35058796
...
2023-09-22 11:47:23 (4.74 MB/s) - 'data-for-snakemake-novice-bioinformatics.tar.xz' saved [21262108/21262108]

Perhaps this was an intermittent issue on the day of testing?

tbooth commented 1 year ago

Package this slice of a dataset with a README explaining its provenance and intended usage, with citations and attribution to the original authors. (Aspire to FAIR principles.) It is unclear whether CC BY-SA applies to a pure dataset, which is not typically eligible for copyright protection: this is not a creative work. Was the source dataset released under a license agreement?

Will do. The info is there if you dig through the link to FigShare but not so easy to find. The legal requirements of re-use for ArrayExpress public data are clarified at https://www.ebi.ac.uk/biostudies/help#policy-submission

tbooth commented 1 year ago

Package this slice of a dataset with a README explaining its provenance and intended usage, with citations and attribution to the original authors. (Aspire to FAIR principles.) It is unclear whether CC BY-SA applies to a pure dataset, which is not typically eligible for copyright protection: this is not a creative work. Was the source dataset released under a license agreement?

Will do. The info is there if you dig through the link to FigShare but not so easy to find. The legal requirements of re-use for ArrayExpress public data are clarified at https://www.ebi.ac.uk/biostudies/help#policy-submission

Have added a COPYING.md to the tarball. I have also reproduced that info in LICENSE.md and updated the download links to the new file.

I renamed the "data" folder to "snakemake_data".

I believe all the reviewer comments in this issue are now addressed.