covidgraph / motherlode

Pipeline for running all dataloader scripts for covidgraph in a controlled manner.
https://covidgraph.org
MIT License
3 stars 1 forks source link

LoadingLog node saves only local ID #6

Closed motey closed 4 years ago

motey commented 4 years ago

motherlode logs a successfull dataloader run in a node with the label LoadingLog Based on the dockerhub_image_name and dockerhub_image_hash properties of this node motherlode decides if it need to import this dataloader again.

So when motherlode runs and when a dataloder in a certain version allready ran in the past, and the dataloader did not change, motherlode skips this dataloader.

LoadingLog creation happens here: https://github.com/covidgraph/motherlode/blob/0bc0686f9de821113b88435c3717677833a355e8 /motherlode/main.py#L37

At the moment motherlode saves the local hash id of an docker image, which is on every local docker installation different (at least it seems so to me). Better would to save the hash id of the docker hub image. Then we would have a global consensuses about which dataloader allready ran.

It looks like we can obtain the docker image id with https://docker-py.readthedocs.io/en/stable/images.html#registrydata-objects