crawlab-team / crawlab

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
https://www.crawlab.cn
BSD 3-Clause "New" or "Revised" License
11.37k stars 1.79k forks source link

datetime not JSON serializable when using Data integration with Scrapy #1373

Open oyhel opened 1 year ago

oyhel commented 1 year ago

First of all; amazing project!!

I have enabled data integration by adding ´'crawlab.CrawlabPipeline': 888´ to my list of pipelines. The (Scrapy) project runs without problems without the pipeline enabled. When enabled I get the following error message.

  File "/usr/local/lib/python3.10/dist-packages/twisted/internet/defer.py", line 892, in _runCallbacks
    current.result = callback(  # type: ignore[misc]
  File "/usr/local/lib/python3.10/dist-packages/scrapy/utils/defer.py", line 307, in f
    return deferred_from_coro(coro_f(*coro_args, **coro_kwargs))
  File "/usr/local/lib/python3.10/dist-packages/crawlab/scrapy/pipelines.py", line 10, in process_item
    save_item(result)
  File "/usr/local/lib/python3.10/dist-packages/crawlab/result.py", line 74, in save_item
    get_result_service().save_item(*items)
  File "/usr/local/lib/python3.10/dist-packages/crawlab/result.py", line 23, in save_item
    self.save(list(items))
  File "/usr/local/lib/python3.10/dist-packages/crawlab/result.py", line 36, in save
    self._save(_items)
  File "/usr/local/lib/python3.10/dist-packages/crawlab/result.py", line 50, in _save
    data = json.dumps({
  File "/usr/lib/python3.10/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python3.10/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python3.10/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib/python3.10/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type datetime is not JSON serializable`

Can this be fixed with something like:

now = datetime.datetime.now()

def serialize_datetime(obj):
    if isinstance(obj, datetime.datetime):
        return obj.isoformat()
    raise TypeError("Type not serializable")

json.dumps(now, default=serialize_datetime)`
tikazyq commented 1 year ago

Thanks for your input. Will implement in the next version

glacierck commented 1 year ago

这补丁貌似至今没打上啊~