aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
167 stars 27 forks source link

download https://resources.aertslab.org/cistarget/ #179

Open GeneVector5 opened 11 months ago

GeneVector5 commented 11 months ago

I would like to download all files here: https://resources.aertslab.org/cistarget/

Is there a fast way to download all the contents with the appropriate structures without having to use zsync for individual files?

ghuls commented 11 months ago

To download everything (except all database feather files) with directory structure, you can use:

wget --recursive --timestamping --no-parent -R '*.feather,*.zsync' https://resources.aertslab.org/cistarget/

Downloading all Feather files by default is not recommended as there are old Feather v1 databases and other databases that you probably don't necessarily need. The full resources are > 600GB.

To download a specific subset of databases, first list all directories.

List all directories:

❯ find resources.aertslab.org/ -type d
resources.aertslab.org/
resources.aertslab.org/cistarget
resources.aertslab.org/cistarget/databases
resources.aertslab.org/cistarget/databases/mus_musculus
resources.aertslab.org/cistarget/databases/mus_musculus/mm9
resources.aertslab.org/cistarget/databases/mus_musculus/mm9/refseq_r45
resources.aertslab.org/cistarget/databases/mus_musculus/mm9/refseq_r45/mc9nr
resources.aertslab.org/cistarget/databases/mus_musculus/mm9/refseq_r70
resources.aertslab.org/cistarget/databases/mus_musculus/mm9/refseq_r70/mc9nr
resources.aertslab.org/cistarget/databases/mus_musculus/mm10
resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80
resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc9nr
resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc_v10_clust
resources.aertslab.org/cistarget/databases/mus_musculus/mm10/screen
resources.aertslab.org/cistarget/databases/mus_musculus/mm10/screen/mc_v10_clust
resources.aertslab.org/cistarget/databases/drosophila_melanogaster
resources.aertslab.org/cistarget/databases/drosophila_melanogaster/dm3
resources.aertslab.org/cistarget/databases/drosophila_melanogaster/dm3/flybase_r5.37
resources.aertslab.org/cistarget/databases/drosophila_melanogaster/dm3/flybase_r5.37/mc9nr
resources.aertslab.org/cistarget/databases/drosophila_melanogaster/dm6
resources.aertslab.org/cistarget/databases/drosophila_melanogaster/dm6/flybase_r6.02
resources.aertslab.org/cistarget/databases/drosophila_melanogaster/dm6/flybase_r6.02/mc8nr
resources.aertslab.org/cistarget/databases/drosophila_melanogaster/dm6/flybase_r6.02/tc_v1
resources.aertslab.org/cistarget/databases/drosophila_melanogaster/dm6/flybase_r6.02/mc9nr
resources.aertslab.org/cistarget/databases/drosophila_melanogaster/dm6/flybase_r6.02/mc_v10_clust
resources.aertslab.org/cistarget/databases/old
resources.aertslab.org/cistarget/databases/old/mus_musculus
resources.aertslab.org/cistarget/databases/old/mus_musculus/mm9
resources.aertslab.org/cistarget/databases/old/mus_musculus/mm9/refseq_r45
resources.aertslab.org/cistarget/databases/old/mus_musculus/mm9/refseq_r70
resources.aertslab.org/cistarget/databases/old/mus_musculus/mm10
resources.aertslab.org/cistarget/databases/old/mus_musculus/mm10/refseq_r80
resources.aertslab.org/cistarget/databases/old/drosophila_melanogaster
resources.aertslab.org/cistarget/databases/old/drosophila_melanogaster/dm3
resources.aertslab.org/cistarget/databases/old/drosophila_melanogaster/dm3/flybase_r5.37
resources.aertslab.org/cistarget/databases/old/drosophila_melanogaster/dm6
resources.aertslab.org/cistarget/databases/old/drosophila_melanogaster/dm6/flybase_r6.02
resources.aertslab.org/cistarget/databases/old/homo_sapiens
resources.aertslab.org/cistarget/databases/old/homo_sapiens/hg19
resources.aertslab.org/cistarget/databases/old/homo_sapiens/hg19/refseq_r45
resources.aertslab.org/cistarget/databases/old/homo_sapiens/hg38
resources.aertslab.org/cistarget/databases/old/homo_sapiens/hg38/refseq_r80
resources.aertslab.org/cistarget/databases/homo_sapiens
resources.aertslab.org/cistarget/databases/homo_sapiens/hg19
resources.aertslab.org/cistarget/databases/homo_sapiens/hg19/refseq_r45
resources.aertslab.org/cistarget/databases/homo_sapiens/hg19/refseq_r45/tc_v1
resources.aertslab.org/cistarget/databases/homo_sapiens/hg19/refseq_r45/mc9nr
resources.aertslab.org/cistarget/databases/homo_sapiens/hg38
resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80
resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/tc_v1
resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc9nr
resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc_v10_clust
resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc_v10_clust/gene_based
resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen
resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust
resources.aertslab.org/cistarget/tf_lists
resources.aertslab.org/cistarget/regions
resources.aertslab.org/cistarget/track2tf
resources.aertslab.org/cistarget/programs
resources.aertslab.org/cistarget/motif2tf
resources.aertslab.org/cistarget/motif_collections
resources.aertslab.org/cistarget/motif_collections/v10nr_clust_public
resources.aertslab.org/cistarget/motif_collections/v10nr_clust_public/singletons
resources.aertslab.org/cistarget/motif_collections/v10nr_clust_public/snapshots
resources.aertslab.org/cistarget/motif_collections/v10nr_clust_public/logos

Then construct the wget command to only download that subset:

# For e.g. resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc_v10_clust/
wget --recursive --timestamping --no-parent -R '*.zsync' https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc_v10_clust/
GeneVector5 commented 11 months ago

@ghuls

Thank you for the detailed reply. Where are you getting "> 600GB" from? I saw file that was nearly 100GB, another 33GB from the human folder