datalad / datalad-crawler

DataLad extension for tracking web resources as datasets
http://datalad.org
Other
5 stars 16 forks source link

save versions .json state file reflecting the actual state, not the "target" #123

Open yarikoptic opened 1 year ago

yarikoptic commented 1 year ago

ATM we seems to be saving the time stamp for the last url/file to consider, not necessarily "Saved". So if we interrupt process -- that process would not have master.json reflecting current state, as e.g. here we get a file which is not even present locally

(git)smaug:/mnt/datasets/datalad/crawl-misc/hbn-bids-derivatives/qsiprep[master]git
$> cat .datalad/crawl/versions/master.json
{
  "db_version": 1,
  "version": {
    "last-modified": "2020-11-10T10:53:11.000Z",
    "name": "data/Projects/HBN/BIDS_curated/derivatives/qsiprep/sub-NDARMJ495DE0/anat/sub-NDARMJ495DE0_desc-brain_mask.nii.gz",
    "version-id": "uyAWUa8B6S5ngXt3uDeiesSTZYtvqPYj"
  },
  "versions": []
}%                                                                                                                                                                                                                                                                                                            

$> ls -ld data/Projects/HBN/BIDS_curated/derivatives/qsiprep/sub-NDARMJ495DE0/anat/sub-NDARMJ495DE0_desc-brain_mask.nii.gz
ls: cannot access 'data/Projects/HBN/BIDS_curated/derivatives/qsiprep/sub-NDARMJ495DE0/anat/sub-NDARMJ495DE0_desc-brain_mask.nii.gz': No such file or directory