datalad / datalad-crawler

DataLad extension for tracking web resources as datasets
http://datalad.org
Other
5 stars 16 forks source link

s3: TypeError: '>' not supported between instances of 'str' and 'NoneType' #68

Closed yarikoptic closed 4 years ago

yarikoptic commented 4 years ago

might be a side-effect of nearby fix in #45

(git)smaug:/mnt/btrfs/datasets/datalad/crawl/adhd200/RawDataBIDS[master]
$> datalad --dbg crawl
[INFO   ] Loading pipeline specification from ./.datalad/crawl/crawl.cfg
[INFO   ] Creating a pipeline for the fcp-indi bucket
[INFO   ] Running pipeline [<datalad_crawler.nodes.s3.crawl_s3 object at 0x7fa6bc51bf98>, sub(ok_missing=True, subs=<<{'url': {'^s3://([^/]*...>>), switch(default=None, key='datalad_action', mapping=<<{'commit': <function A...>>, re=False)]
[INFO   ] S3 session: Connecting to the bucket fcp-indi with authentication
Traceback (most recent call last):
  File "/home/yoh/proj/datalad/datalad-crawler/venvs/dev3/bin/datalad", line 8, in <module>
    main()
  File "/home/yoh/proj/datalad/datalad-0.11.x/datalad/cmdline/main.py", line 494, in main
    ret = cmdlineargs.func(cmdlineargs)
  File "/home/yoh/proj/datalad/datalad-0.11.x/datalad/interface/base.py", line 626, in call_from_parser
    ret = cls.__call__(**kwargs)
  File "/home/yoh/proj/datalad/datalad-crawler/datalad_crawler/crawl.py", line 130, in __call__
    output = run_pipeline(pipeline, stats=stats)
  File "/home/yoh/proj/datalad/datalad-crawler/datalad_crawler/pipeline.py", line 114, in run_pipeline
    output = list(xrun_pipeline(*args, **kwargs))
  File "/home/yoh/proj/datalad/datalad-crawler/datalad_crawler/pipeline.py", line 194, in xrun_pipeline
    for idata_out, data_out in enumerate(xrun_pipeline_steps(pipeline, data_in, output=output_sub)):
  File "/home/yoh/proj/datalad/datalad-crawler/datalad_crawler/pipeline.py", line 270, in xrun_pipeline_steps
    for data_ in data_in_to_loop:
  File "/home/yoh/proj/datalad/datalad-crawler/datalad_crawler/nodes/s3.py", line 176, in __call__
    if lm > last_modified_:
TypeError: '>' not supported between instances of 'str' and 'NoneType'

> /home/yoh/proj/datalad/datalad-crawler/datalad_crawler/nodes/s3.py(176)__call__()
yarikoptic commented 4 years ago

seems to be due to the fact that "version" record was recorded for a Prefix, and lacked date/version_id:

(git)smaug:/mnt/btrfs/datasets/datalad/crawl/adhd200/RawDataBIDS[master]git
$> cat .datalad/crawl/versions/master.json
{
  "db_version": 1,
  "version": {
    "last-modified": null,
    "name": "data/Projects/ADHD200/RawDataBIDS/nyu_1/",
    "version-id": null
  },
  "versions": []
}%          

so code should be adjusted to deal properly with those: