disinfoRG / FbScraper

MIT License
3 stars 2 forks source link

url encode error during discover #8

Closed andreawwenyi closed 4 years ago

andreawwenyi commented 4 years ago
Traceback (most recent call last):
  File "discover.py", line 91, in <module>
    main()
  File "discover.py", line 82, in main
    test(browser, logfile, max_try_times)
  File "discover.py", line 50, in test
    discover_one(site, browser, logfile, max_try_times)
  File "discover.py", line 43, in discover_one
    ps.work()
  File "/Users/wyw/Codes/FbScraping/page_spider.py", line 18, in work
    pc.crawl()
  File "/Users/wyw/Codes/FbScraping/page_crawler.py", line 17, in crawl
    self.expand_post()
  File "/Users/wyw/Codes/FbScraping/page_crawler.py", line 55, in expand_post
    self.write_to_db_func(p_url)
  File "/Users/wyw/Codes/FbScraping/page_pipeline.py", line 32, in write_post
    p['url_hash'] = zlib.crc32(url.encode())
AttributeError: 'NoneType' object has no attribute 'encode'

I think the source of errors is here

andreawwenyi commented 4 years ago

temporary fix: skip url if it is None https://github.com/disinfoRG/FbScraping/blob/increase_max_try_time/page_crawler.py#L56

dieface commented 4 years ago

可以再提供我你遇到解出來是 None 的原始 url 嗎?我目前打算當解不出 permalink 讓他回傳原始的 url (見這個 pr #10)