ESPRI-Mod / synda

ESGF Downloader (this is a deprecated repository, the tool has now moved to https://github.com/ESGF/esgf-download)
https://espri-mod.github.io/synda/
21 stars 11 forks source link

Dataset not found but available on ESGF web interface #178

Closed francocatalano closed 3 years ago

francocatalano commented 3 years ago

Hi, Some datasets are not found by synda but are available on the esgf web interface. For example, searching for this dataset (in a selection file): project=CMIP6 variant_label=r1i1p1f1,r1i1p1f2 source_id=IPSL-CM6A-LR experiment_id=amip-lfmip-pdLC,amip-lfmip-rmLC table_id=Amon variable_id=ua,va latest=true

returns: Dataset not found

I've played a bit changing indexes and default_index parameters in conf/sdt.conf but it did not solve the problem.

But if I look for the the same dataset on, e.g., https://esg-dn1.nsc.liu.se/search/cmip6-liu/ I can find and download it with wget.

Any reason for this strange behavior and how can I make synda discover and download the dataset? I am using synda v3.32

Thanks a lot.

Franco

painter1 commented 3 years ago

I can report the same phenomenon. With CMIP6, r1i1p1f1, amip-lfmip-pdLC, Amon, ua, the ESGF search page shows datasets from NCAR, MPI, IPSL, and EC-Earth. My large Synda database is supposed to include all Amon data - an incremental install is run nightly. But for this case it includes only datasets from NCAR, MPI, and EC-Earth - no matching IPSL dataset. In the search page I cannot see anything unusual about the IPSL dataset.

I tried a search url similar to those which Synda uses. It reported datasets from all four sources: https://esgf-node.llnl.gov/esg-search/search?variable_id=ua&fields=instance_id,timestamp,_timestamp,type,size&table_id=Amon&mip_era=CMIP6&experiment_id=amip-lfmip-pdLC&variant_label=r1i1p1f1&distrib=true&type=Dataset&latest=true&format=application%2Fsolr%2Bjson&limit=9000&offset=0

This time there was something unique about IPSL. For the other three institutes, the search found two results, identical in the returned fields other than _timestamp. There was only one result for IPSL. I have no idea of what this could mean.

AtefBN commented 3 years ago

Hello @francocatalano I have duplicated this behavior on our development environment and retraced the issue to the default CMIP6 template file that is used to sometimes add default parameters to synda queries, only this time it was more harmful than helpful. To fix this you can go to your synda home directory, under conf/default/default_CMIP6.txt get rid of the institutionç_id option as well as the data_node option. This resulted in synda finding the files as expected. Hope this helps.

I have not tried to duplicate the fix to your case @painter1 but I suspect it should yield similar results, ping me if otherwise! And sorry gentlemen for the inconvenience.

painter1 commented 3 years ago

Thank you Atef! I cleaned out the default selection files as you suggested and the missing dataset appeared on another installation. Probably more datasets came too; I haven't checked.