ESPRI-Mod / synda

ESGF Downloader (this is a deprecated repository, the tool has now moved to https://github.com/ESGF/esgf-download)
https://espri-mod.github.io/synda/
21 stars 11 forks source link

Configuration option for gridftp DTNs #56

Closed stephank16 closed 8 years ago

stephank16 commented 8 years ago

Large data transfers will be done for CMIP6 between DTNs. Not all centers will publish the DTN gridftp urls into the solr index (they should only be used by replication managers not end users). For replication data managers a configuration option for synda would be helpfull to replace a known prefix to the DTNs prefix.

e.g. for DKRZ: configuration option: DTN_replace: "gsifp://esgf1.dkrz.de/data/cmip6": "gridftp.dkrz.de://pool/data/projects/cmip6"

Effect: for a given selection file all found gridftp links matching the first expressen are replaced by the second expression, before being written to the synda db --> all transfers are done from the DTN not the published (end-user) gridftp link

Would be easy to implement and helfull in replication testing phase as well as production phase (e.g. at DKRZ the replication DTN will be only usable by known replication data managers and not generic users, thus this gridftp url will not be published to the ESGF solr index)

ghost commented 8 years ago

I will add a new selection file parameter with the following syntax:

url_replace=s|oldstring|newstring|

It uses the pipe character as delimiter ('|'), which makes implementation easier (no collision between space delimiter in selection file)

So your example becomes

url_replace=s|gsiftp://esgf1.dkrz.de/data/cmip6|gsiftp://gridftp.dkrz.de/pool/data/projects/cmip6|
ghost commented 8 years ago

Added in 3.5.

https://github.com/Prodiguer/synda/blob/master/sdt/doc/selection_file_parameter_reference.md#url_replace