carpentries-incubator / snakemake-novice-bioinformatics

Introduction to Snakemake for Bioinformatics
https://carpentries-incubator.github.io/snakemake-novice-bioinformatics
Other
18 stars 9 forks source link

Reconsider use of Figshare for the sample data #67

Open tbooth opened 3 months ago

tbooth commented 3 months ago

From @cmeesters:

"See this link for details about this dataset and the redistribution licence." contains the link https://figshare.com/articles/dataset/data-for-snakemake-novice-bioinformatics_tar_xz/19733338/1. It leads to a description on summary level, but also a "sorry, we can't preview this file" - which is slightly irritating. Is figshare a good place for non-figure data?

tbooth commented 3 months ago

I'd agree that Figshare is problematic. I was copying Data Carpentry: https://datacarpentry.org/image-processing/instructor/index.html#data

But I think we can do a bit better. I've also noticed that if you download the file from Figshare too many times it puts a temporary block on downloads, which could be a real problem.

I'll see about hosting the data on WorkflowHub.eu or somewhere like that.

tbooth commented 3 months ago

I could put a copy of the file here on GitHub, but GitHub does not allow .tar.xz files, and the .tar.gz version is just a tad too big. I guess I could shorten the FASTQ headers to shave off a bit of space, or do something horrible like this:

data-for-snakemake-novice-bioinformatics.tar.xz.gz