ESGF / esgf-download

ESGF data transfer and replication tool
https://esgf.github.io/esgf-download/
BSD 3-Clause "New" or "Revised" License
15 stars 2 forks source link

Use different data_node to download a file already in database #31

Open svenrdz opened 11 months ago

svenrdz commented 11 months ago

This is currently not working, the file would always be flagged as duplicate from a db entry and the data_node will always stay the same as the first one added to the database.

Currently, the only workaround is manually deleting the files from database, I will explain how to do it with the following example:

Example

From a query abc123, some files did not download, since data node 'vesg.ipsl.upmc.fr' is down.

To find out which data node is down, you can use this python snippet:

from esgpull import Esgpull
from esgpull.models import FileStatus

esg = Esgpull(path="path/to/install")
query = esg.graph.get("abc123")
data_nodes = set(f.data_node for f in query.files if f.status != FileStatus.Done)
print(data_nodes)

In this example, this would print:

{'vesg.ipsl.upmc.fr'}

To delete the files that did not download, the snippet is very similar:

from esgpull import Esgpull
from esgpull.models import FileStatus

esg = Esgpull(path="path/to/install")
query = esg.graph.get("abc123")
missing_files = [f for f in query.files if f.status != FileStatus.Done]
esg.db.delete(*missing_files)

Now, you can create a new query this way. Updating it will now pick up another data_node for missing files:

$ esgpull add --require abc123 "!data_node:vesg.ipsl.upmc.fr" --track
$ esgpull update <new_query_id>
$ esgpull download