datalad / datalad-crawler

DataLad extension for tracking web resources as datasets
http://datalad.org
Other
5 stars 16 forks source link

datalad crawl can't connect to s3 and produces changing error messages #47

Closed TobiasKadelka closed 5 years ago

TobiasKadelka commented 5 years ago

While trying to "datalad crawl" with the change from https://github.com/datalad/datalad-crawler/issues/46#issuecomment-510567284 on our server, this happens:

❱ datalad crawl  
[INFO   ] Loading pipeline specification from ./.datalad/crawl/crawl.cfg 
[INFO   ] Creating a pipeline for the hcp-openaccess bucket 
[INFO   ] Running pipeline [<datalad_crawler.nodes.s3.crawl_s3 object at 0x7f598e460a58>, switch(default=None, key='datalad_action', mapping=<<{'commit': <function A...>>, re=False)] 
[INFO   ] S3 session: Connecting to the bucket hcp-openaccess with authentication 
[ERROR  ] unorderable types: NoneType() < str() [s3.py:__call__:162] (TypeError) 
Exception ignored in: <bound method AnnexRepo.__del__ of <AnnexRepo path=/home/homeGlobal/tkadelka/hcp/hcp (<class 'datalad.support.annexrepo.AnnexRepo'>)>>
Traceback (most recent call last):
  File "/home/homeGlobal/tkadelka/env/datalad-crawler/datalad.git/datalad/support/annexrepo.py", line 365, in __del__
AttributeError: 'NoneType' object has no attribute 'debug'
Exception ignored in: <function WeakValueDictionary.__init__.<locals>.remove at 0x7f59e9414840>
Traceback (most recent call last):
  File "/home/homeGlobal/tkadelka/env/datalad-crawler/lib/python3.5/weakref.py", line 117, in remove
TypeError: 'NoneType' object is not callable

Also, when I then just re-run the "datalad crawl" without any changes, sometimes other error messages appear (in addition to the one above, which is always displayed):

Additional Error 1 ❱ datalad crawl [INFO ] Loading pipeline specification from ./.datalad/crawl/crawl.cfg [INFO ] Creating a pipeline for the hcp-openaccess bucket [INFO ] Running pipeline [, switch(default=None, key='datalad_action', mapping=<<{'annex': >, re=False)] [INFO ] S3 session: Connecting to the bucket hcp-openaccess with authentication [ERROR ] unorderable types: NoneType() < str() [s3.py:__call__:162] (TypeError) Exception ignored in: )>> Traceback (most recent call last): File "/home/homeGlobal/tkadelka/env/datalad-crawler/datalad.git/datalad/support/annexrepo.py", line 365, in __del__ File "/home/homeGlobal/tkadelka/env/datalad-crawler/datalad.git/datalad/dochelpers.py", line 328, in exc_str AttributeError: 'NoneType' object has no attribute 'get' Exception ignored in: .remove at 0x7ff99ef63840> Traceback (most recent call last): File "/home/homeGlobal/tkadelka/env/datalad-crawler/lib/python3.5/weakref.py", line 117, in remove TypeError: 'NoneType' object is not callable
Additional Error 2 ```shell > datalad crawl 1 ! [INFO ] Loading pipeline specification from ./.datalad/crawl/crawl.cfg [INFO ] Creating a pipeline for the hcp-openaccess bucket [INFO ] Running pipeline [, switch(default=None, key='datalad_action', mapping=<<{'commit': >, re=False)] [INFO ] S3 session: Connecting to the bucket hcp-openaccess with authentication [ERROR ] unorderable types: NoneType() < str() [s3.py:__call__:162] (TypeError) Exception ignored in: )>> Traceback (most recent call last): File "/home/homeGlobal/tkadelka/env/datalad-crawler/datalad.git/datalad/support/annexrepo.py", line 362, in __del__ File "/home/homeGlobal/tkadelka/env/datalad-crawler/datalad.git/datalad/support/gitrepo.py", line 965, in __del__ File "/home/homeGlobal/tkadelka/env/datalad-crawler/lib/python3.5/genericpath.py", line 19, in exists AttributeError: 'NoneType' object has no attribute 'stat' ```
yarikoptic commented 5 years ago

All the errors are same

[ERROR ] unorderable types: NoneType() < str() [s3.py:call:162] (TypeError)

Could you rerun with --dbg and then print entire stack (bt) ?

The ones from del can be ignored - i will silence them when get to the laptop

yarikoptic commented 5 years ago

oh, this one is probably the duplicate of #44 which has [ERROR ] '<' not supported between instances of 'NoneType' and 'str' [s3.py:__call__:162] (TypeError) and for which @mih submitted #45 which I just merged. I will retitle and close this one with a fix for __del__

yarikoptic commented 5 years ago

actually -- it is in the __del__ of datalad core, so nothing to be done here (will do there), closing