cannin / ihop-reach

A web application to access biological data extracted from biomedical literature.
https://reach.nrnb-docker.ucsd.edu
GNU Lesser General Public License v3.0
4 stars 4 forks source link

pubmed/data-extraction/extraction Download Updates #73

Open lbeckman314 opened 5 years ago

lbeckman314 commented 5 years ago

This issue has two proposed additions to the wget approach as outlined in the pubmed/data-extraction/extraction README.


1) Add a Zsh compatible wget command to the download instructions.

Current Version (Zsh Error).

wget --cut-dirs 3 -r ftp://ftp.ncbi.nlm.nih.gov/pubmed/baseline/*.xml.gz
zsh: no matches found: ftp://ftp.ncbi.nlm.nih.gov/pubmed/baseline/*.xml.gz

Current Version (Zsh compatible).

wget --cut-dirs 3 -r 'ftp://ftp.ncbi.nlm.nih.gov/pubmed/baseline/*.xml.gz'
# Success!

2) Add a parallel wget command to the download instructions.

Parallel Version

parallel --bar 'wget --cut-dirs 3 -r ftp://ftp.ncbi.nih.gov/pubmed/baseline/pubmed19n$(printf "%04d" {1}).xml.gz' ::: $(seq 972)

Parallel Version (Zsh compatible)

parallel --bar 'wget --cut-dirs 3 -r -A pubmed19n$(printf "%04d" {1}).xml.gz ftp://ftp.ncbi.nih.gov/pubmed/baseline/' ::: $(seq 972)

Cons:

Pros:


Let me know if you have any suggestions. If these aren't suited for adoption, no worries!

Thanks for making a great resource!

liam