fr0der1c / Readform

Sending full article content of paywalled news websites right into your Readwise Reader feed to help you get a unified reading workflow.
GNU General Public License v3.0
70 stars 0 forks source link

No caixin articles appear in the readwise feed #2

Open duneploo opened 1 year ago

duneploo commented 1 year ago

Please review the log:

INFO 2023-08-05 10:31:01,671 readform (website_base.py:244) [caixin] Start to refresh INFO 2023-08-05 10:32:02,149 readform (website_base.py:215) Getting page content for https://finance.caixin.com/2023-08-04/102090125.html INFO 2023-08-05 10:32:22,583 readform (website_caixin.py:51) waiting for article body to load... INFO 2023-08-05 10:32:23,199 readform (website_caixin.py:63) body loading finished INFO 2023-08-05 10:32:23,693 readform (website_caixin.py:67) is paywalled content and not logged-in INFO 2023-08-05 10:32:23,693 readform (website_caixin.py:84) logging in... INFO 2023-08-05 10:32:37,888 readform (website_caixin.py:117) waiting to be redirected... ERROR 2023-08-05 10:34:18,070 readform (website_base.py:230) [caixin] Got exception while getting content: Traceback (most recent call last): File "/var/app/website_base.py", line 219, in handle_article content = get_page_content(single_url) File "/var/app/website_base.py", line 209, in get_page_content return domain_agent_dict[domain].get_page_content(url) File "/var/app/website_base.py", line 96, in get_page_content self.ensure_logged_in() File "/var/app/website_caixin.py", line 68, in ensure_logged_in self.login(self.get_driver()) File "/var/app/website_caixin.py", line 120, in login WebDriverWait(driver, timeout=100).until(invisibility_of_element_located(login_form)) File "/usr/local/lib/python3.10/site-packages/selenium/webdriver/support/wait.py", line 95, in until raise TimeoutException(message, screen, stacktrace) selenium.common.exceptions.TimeoutException: Message:

ERROR 2023-08-05 10:34:18,070 readform (website_base.py:232) Page https://finance.caixin.com/2023-08-04/102090125.html will be retried later.

How to fix it, please help!

fr0der1c commented 1 year ago

你好,想问一下是一篇文章都没成功过,还是运行到后来突然报错的?

duneploo commented 1 year ago

是一篇文章都没成功过

fr0der1c commented 1 year ago

我这边试了是正常的。请问你的账号有会员吗?

2jfuox2hx commented 11 months ago

@fr0der1c 你好,我这边Linux/Docker Log也同样报错,Caixin是会员账号:

`INFO 2023-11-14 13:23:33,684 readform (website_base.py:244) [caixin] Start to refresh INFO 2023-11-14 13:23:33,772 readform (website_base.py:251) [caixin] Latest articles: [] INFO 2023-11-14 13:24:33,810 readform (website_base.py:215) Getting page content for https://companies.caixin.com/2023-11-13/102135363.html ERROR 2023-11-14 13:24:33,819 readform (website_base.py:230) [caixin] Got exception while getting content: Traceback (most recent call last): File "/var/app/website_base.py", line 219, in handle_article content = get_page_content(single_url) File "/var/app/website_base.py", line 209, in get_page_content return domain_agent_dict[domain].get_page_content(url) File "/var/app/website_base.py", line 86, in get_page_content self.driver.get("about:blank") File "/usr/local/lib/python3.10/site-packages/selenium/webdriver/remote/webdriver.py", line 354, in get self.execute(Command.GET, {"url": url}) File "/usr/local/lib/python3.10/site-packages/selenium/webdriver/remote/webdriver.py", line 345, in execute self.error_handler.check_response(response) File "/usr/local/lib/python3.10/site-packages/selenium/webdriver/remote/errorhandler.py", line 229, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.InvalidSessionIdException: Message: Tried to run command without establishing a connection

ERROR 2023-11-14 13:24:33,822 readform (website_base.py:232) Page https://companies.caixin.com/2023-11-13/102135363.html will be retried later. `

fr0der1c commented 8 months ago

@duneploo @2jfuox2hx hi,Readform已经发布v1.0.0版本。这一版本使用Go语言完全重写了整个仓库的代码,并修复了若干问题,欢迎尝试使用这一版本,并检查是否可以正常使用了。