Open pkasibhatla opened 2 years ago
Hi Prasad,
I think this is an issue with the version of curl that's installed. Could you try wget instead? The commands would be
$ bashdatacatalog-list -am -r 2019-06-30,2019-08-02 -f xargs-curl InputDataCatalogs/13.4/ChemistryInputs.csv > url_download_list.txt
$ wget -i url_download_list.txt -x -nH -nv --cut-dirs=4
The first command generates a url list, and the second command downloads the list with wget. Could you try this?
I'll take a look at fixing the -f xargs-curl
format. It looks like I'm using an option that was added more recently then I thought.
Hi Liam,
Yes, I realized this may be the case after I sent my email and have been trying what you suggest (from ExtData cut-dirs=1) and so far so good.
I didnt examine the curl issue carefully but it seemed to mess up on a lot of files and sometimes looked like filenames were being switched during the dowload step.
Best, Prasad
I also occurred the same "number" issue when downloaded the chem input data, but my problem cannot be solved by the wget method provided by Liam.
I followed the code provided by Liam, and when I run the wget line, I got the feedback "No URLs found in download_url_list.txt."
I'm also getting a lot of these curl argument errors described above using the command:
bashdatacatalog-list -am -r "2012-06-01,2013-12-01" -f xargs-curl DataCatalogs/14.1.1/*.csv | xargs -P 4 curl
and when I generate a url list as follows:
bashdatacatalog-list -am -r 2010-09-01,2012-01-01 -f xargs-curl DataCatalogs/14.1.1/*.csv > url_download_list.txt
I get this file: url_download_list.txt
and when I try to use wget as follows:
wget -i url_download_list.txt -x -nH -nv
I also get the error: No URLs found in url_download_list.txt.
Has there been any progress on this bug? I'm trying to set up my server at UUtah so I'm needing to download a lot of dif input files... What version of curl is required to not get these errors? Does anyone have an idea of how this curl -o error messes with the files downloaded? Does it indeed mess up the names as Prasad indicated?
Hi @jhaskinsPhD, thanks for writing. I was able to replicate your error.
Am tagging @SaptSinha who may be more knowledgeable about bashdatacatalog issues than I am.
Also tagging @LiamBindle, who has since left the GEOS-Chem community, but still may have some ideas.
@jhaskinsPhD: You might also consider using Globus Endpoint for the file transfer. I bet that U of Utah has a Globus account, you can check with your IT support staff there. Download from "GEOS-Chem data (WashU)".
Hey @jhaskinsPhD , I believe I solved it with the help of this link: https://github.com/LiamBindle/bashdatacatalog/wiki/3.-Useful-Commands
I used this command to solve this problem: $ bashdatacatalog-list -am -f url catalog.csv > url_download_list.txt $ wget -i url_download_list.txt -x -nH -nv --cut-dirs=4 # you will need to modify --cut-dirs=N
The first line added url comparing to the answer of this issue.
You can also use the Globus as @yantosca said by those commands: $ bashdatacatalog-list -am -f globus="$(pwd),/remote-data-root/" catalog.csv > globus_batch.txt $ globus transfer --batch globus_batch.txt SOURCE_ENDPOINT_ID DEST_ENDPOINT_ID
Hope this helps!
Thanks @jiaying002 for the feedback on this issue!
For anyone who may be confused about the wget method. The right way to do this seems to be: _1.bashdatacatalog-list -am -r 2019-06-30,2019-08-02 -f url InputDataCatalogs/13.4/ChemistryInputs.csv > url_downloadlist.txt where the argument -f url means url links instead of xargs curl _2.wget -i url_downloadlist.txt -x -nH -nv --cut-dirs=1 -x will create a hierarchy of directories by urls. -nH will remove host-prefixed directories (geoschemdata.wustl.edu in this case). The setting of --cut-dirs = ? will depend on the location of your download txt. It will allow you to cut the component of the dirctories. e.g. --cut-dirs=1 will remove ExtData/ in ExtData/CHEM_INPUTS/
You will also need to repeat the methods above whenever new requests are needed to update the downloading urls.
I am trying to download the chem input files for GCHP 13.4. The command
bashdatacatalog-fetch InputDataCatalogs/13.4/ChemistryInputs.csv
from my ExtData directory seems to work fine. Output is attached below.But when I give the command
bashdatacatalog-list -am -r 2019-06-30,2019-08-02 -f xargs-curl InputDataCatalogs/13.4/ChemistryInputs.csv | xargs curl
I get a bunch of numbers on the screen. Here are the first few lines of what I see:The bashdatacatalog-list seems to work fine for fetching the met data and the hemco files. chem_meta.txt