Open GeneVector5 opened 11 months ago
To download everything (except all database feather files) with directory structure, you can use:
wget --recursive --timestamping --no-parent -R '*.feather,*.zsync' https://resources.aertslab.org/cistarget/
Downloading all Feather files by default is not recommended as there are old Feather v1 databases and other databases that you probably don't necessarily need. The full resources are > 600GB.
To download a specific subset of databases, first list all directories.
List all directories:
❯ find resources.aertslab.org/ -type d
resources.aertslab.org/
resources.aertslab.org/cistarget
resources.aertslab.org/cistarget/databases
resources.aertslab.org/cistarget/databases/mus_musculus
resources.aertslab.org/cistarget/databases/mus_musculus/mm9
resources.aertslab.org/cistarget/databases/mus_musculus/mm9/refseq_r45
resources.aertslab.org/cistarget/databases/mus_musculus/mm9/refseq_r45/mc9nr
resources.aertslab.org/cistarget/databases/mus_musculus/mm9/refseq_r70
resources.aertslab.org/cistarget/databases/mus_musculus/mm9/refseq_r70/mc9nr
resources.aertslab.org/cistarget/databases/mus_musculus/mm10
resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80
resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc9nr
resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc_v10_clust
resources.aertslab.org/cistarget/databases/mus_musculus/mm10/screen
resources.aertslab.org/cistarget/databases/mus_musculus/mm10/screen/mc_v10_clust
resources.aertslab.org/cistarget/databases/drosophila_melanogaster
resources.aertslab.org/cistarget/databases/drosophila_melanogaster/dm3
resources.aertslab.org/cistarget/databases/drosophila_melanogaster/dm3/flybase_r5.37
resources.aertslab.org/cistarget/databases/drosophila_melanogaster/dm3/flybase_r5.37/mc9nr
resources.aertslab.org/cistarget/databases/drosophila_melanogaster/dm6
resources.aertslab.org/cistarget/databases/drosophila_melanogaster/dm6/flybase_r6.02
resources.aertslab.org/cistarget/databases/drosophila_melanogaster/dm6/flybase_r6.02/mc8nr
resources.aertslab.org/cistarget/databases/drosophila_melanogaster/dm6/flybase_r6.02/tc_v1
resources.aertslab.org/cistarget/databases/drosophila_melanogaster/dm6/flybase_r6.02/mc9nr
resources.aertslab.org/cistarget/databases/drosophila_melanogaster/dm6/flybase_r6.02/mc_v10_clust
resources.aertslab.org/cistarget/databases/old
resources.aertslab.org/cistarget/databases/old/mus_musculus
resources.aertslab.org/cistarget/databases/old/mus_musculus/mm9
resources.aertslab.org/cistarget/databases/old/mus_musculus/mm9/refseq_r45
resources.aertslab.org/cistarget/databases/old/mus_musculus/mm9/refseq_r70
resources.aertslab.org/cistarget/databases/old/mus_musculus/mm10
resources.aertslab.org/cistarget/databases/old/mus_musculus/mm10/refseq_r80
resources.aertslab.org/cistarget/databases/old/drosophila_melanogaster
resources.aertslab.org/cistarget/databases/old/drosophila_melanogaster/dm3
resources.aertslab.org/cistarget/databases/old/drosophila_melanogaster/dm3/flybase_r5.37
resources.aertslab.org/cistarget/databases/old/drosophila_melanogaster/dm6
resources.aertslab.org/cistarget/databases/old/drosophila_melanogaster/dm6/flybase_r6.02
resources.aertslab.org/cistarget/databases/old/homo_sapiens
resources.aertslab.org/cistarget/databases/old/homo_sapiens/hg19
resources.aertslab.org/cistarget/databases/old/homo_sapiens/hg19/refseq_r45
resources.aertslab.org/cistarget/databases/old/homo_sapiens/hg38
resources.aertslab.org/cistarget/databases/old/homo_sapiens/hg38/refseq_r80
resources.aertslab.org/cistarget/databases/homo_sapiens
resources.aertslab.org/cistarget/databases/homo_sapiens/hg19
resources.aertslab.org/cistarget/databases/homo_sapiens/hg19/refseq_r45
resources.aertslab.org/cistarget/databases/homo_sapiens/hg19/refseq_r45/tc_v1
resources.aertslab.org/cistarget/databases/homo_sapiens/hg19/refseq_r45/mc9nr
resources.aertslab.org/cistarget/databases/homo_sapiens/hg38
resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80
resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/tc_v1
resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc9nr
resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc_v10_clust
resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc_v10_clust/gene_based
resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen
resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust
resources.aertslab.org/cistarget/tf_lists
resources.aertslab.org/cistarget/regions
resources.aertslab.org/cistarget/track2tf
resources.aertslab.org/cistarget/programs
resources.aertslab.org/cistarget/motif2tf
resources.aertslab.org/cistarget/motif_collections
resources.aertslab.org/cistarget/motif_collections/v10nr_clust_public
resources.aertslab.org/cistarget/motif_collections/v10nr_clust_public/singletons
resources.aertslab.org/cistarget/motif_collections/v10nr_clust_public/snapshots
resources.aertslab.org/cistarget/motif_collections/v10nr_clust_public/logos
Then construct the wget command to only download that subset:
# For e.g. resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc_v10_clust/
wget --recursive --timestamping --no-parent -R '*.zsync' https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc_v10_clust/
@ghuls
Thank you for the detailed reply. Where are you getting "> 600GB" from? I saw file that was nearly 100GB, another 33GB from the human folder
I would like to download all files here: https://resources.aertslab.org/cistarget/
Is there a fast way to download all the contents with the appropriate structures without having to use zsync for individual files?