TeamWiseFlow / wiseflow

Wiseflow is an agile information mining tool that extracts concise messages from various sources such as websites, WeChat official accounts, social platforms, etc. It automatically categorizes and uploads them to the database.
Other
4.42k stars 731 forks source link

general_crawler.py 第208行报错 KeyError: 'publish_time' #73

Closed l0g2 closed 2 months ago

l0g2 commented 2 months ago

core/scrapers/general_crawler.py第208行

date_str = extract_and_convert_dates(result['publish_time'])

报错 KeyError: 'publish_time'

l0g2 commented 2 months ago
2024-08-21 11:29:40.628 | INFO     | scrapers.general_crawler:general_crawler:152 - gne extract not good: {'title': '', 'author': '', 'publish_time': '', 'content': '%PDF-...
2024-08-21 11:29:40.631 | INFO     | scrapers.general_crawler:general_crawler:165 - https://....pdf content too long for llm parsing
core-1  | Traceback (most recent call last):
core-1  |   File "/app/tasks.py", line 32, in <module>
core-1  |     asyncio.run(main())
core-1  |   File "/usr/local/lib/python3.10/asyncio/runners.py", line 44, in run
core-1  |     return loop.run_until_complete(main)
core-1  |   File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
core-1  |     return future.result()
core-1  |   File "/app/tasks.py", line 30, in main
core-1  |     await schedule_pipeline(interval_seconds)
core-1  |   File "/app/tasks.py", line 20, in schedule_pipeline
core-1  |     await asyncio.gather(*[process_site(site, counter) for site in sites])
core-1  |   File "/app/tasks.py", line 12, in process_site
core-1  |     await pipeline(site['url'].rstrip('/'))
core-1  |   File "/app/insights/__init__.py", line 31, in pipeline
core-1  |     flag, result = await general_crawler(url, logger)
core-1  |   File "/app/scrapers/general_crawler.py", line 208, in general_crawler
core-1  |     date_str = extract_and_convert_dates(result['publish_time'])
core-1  | KeyError: 'publish_time'

根据日志, 似乎是读取pdf引发的.

l0g2 commented 2 months ago

另外, 出现错误后, 程序不能自动恢复运行.

bigbrother666sh commented 2 months ago

88

done