Open superhenne opened 5 months ago
Hi @superhenne - thanks for raising the issue.
I recall running into such problem some time back. However, the current state of the repo seems to be working for me. See actions. Could you verify if the issue still persists with the latest changes on main branch? Especially the change to chrome (instead of FireFox) has the potential to fix such an issue.
Hi @fgebhart i was and I am using the exact code from the main branch with no changes, and the last run yesterday still failed. How can the same code work for you but not for me? Are you also using thalia? Anything else i can try?
I'm using Hugendubel. That could be the reason why. The login flow for the tolino webreader page is different for thalia. It could of course be that it changed slightly and thus the login flow breaks. I'm blind to this from my end as I cannot test all tolino partners as it would require accounts for all of them.
If you want to dig deeper, I would suggest to
Let me know if you need more input on that. Please understand that I do not have the time to debug this myself.
I ran the tests locally with pytest. Everything worked ok, the test epub got uploaded to webreader. Now i tried to run sync.py local, but it errors on downloading the epaper from zeit, it seems to have an issue with the file not found after download. I also cannot find the path mentioned in the log
(zeit-on-tolino) xxxxxxxx@xxxxxxxx zeit-on-tolino % python sync.py INFO:__main__:logging into ZEIT premium... INFO:__main__:downloading most recent ZEIT e-paper... INFO:zeit_on_tolino.zeit:clicking download button now... Traceback (most recent call last): File "/Users/xxxxxxxx/Documents/GitHub/zeit-on-tolino/sync.py", line 18, in <module> e_paper_path = zeit.download_e_paper(webdriver) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/xxxxxxxx/Documents/GitHub/zeit-on-tolino/zeit_on_tolino/zeit.py", line 94, in download_e_paper wait_for_downloads(webdriver.download_dir_path) File "/Users/xxxxxxxx/Documents/GitHub/zeit-on-tolino/zeit_on_tolino/zeit.py", line 66, in wait_for_downloads while any([filename.endswith(".crdownload") for filename in os.listdir(path)]): ^^^^^^^^^^^^^^^^ FileNotFoundError: [Errno 2] No such file or directory: '/var/folders/v6/2y8jhzts27n4ptr0msbhh_yc0000gn/T/tmp474r_1lx' (zeit-on-tolino) xxxxxxxx@xxxxxxxx zeit-on-tolino %
Seems hard to debug from my end. Also its not clear to me, why it works on github actions for me and locally for you, but not in github actions for you :man_shrugging:
The error reminds me of the recent migration from geckodriver (firefox) to chromedriver, where downloading files is handled differently.
As a last resort you could consider forking the repo again into a new repo on your side.
an update: I got it working, by adding a user agent string in the chrome options. As far as i understood running in headless mode adds a headless keyword in the user agent which is blocked by some web sites.
in web.py
options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.84 Safari/537.36")
The issue with failing locally could be solved by changing the download path to an existing real path: DOWNLOAD_PATH = '/some/existing/path'
The headless user agent triggers the cloudflare "confirm you are a human" screen for thalia.de
Hi @superhenne - sorry I'm a bit busy these days.
So to summarize, we see/saw two issues:
options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.84 Safari/537.36")
- On Mac, I used a static local path, that will not work for other users. Unfortunately i don´t know how to handle this universally. Or leave it as configuration option.
- The cloudflare human check happens with thalia.de, when the user agent contains "headless", which is the default for selenium chrome driver. This can easily be overriden with this line in web.py, def get_webdriver:
options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.84 Safari/537.36")
Just want to confirm the add_argument fixed the problem with Thalia for me. The periodic sync does work with this. I can't rate the other side effects so I will not make a pull request for it.
Best regards
When trying to log in to tolino cloud, the script raises a Timeout error. I made sure the credentials are ok, i also pulled a fresh copy of the repo. Please error log below, what else can i check? Shop is Thalia.
`INFO:main:logging into ZEIT premium... INFO:main:downloading most recent ZEIT e-paper... INFO:zeit_on_tolino.zeit:clicking download button now... INFO:main:successfully finished download of 'DIE ZEIT - Nr. 17, 18.04.2024' INFO:main:upload ZEIT e-paper to tolino cloud... INFO:zeit_on_tolino.tolino:logging into tolino cloud... Traceback (most recent call last): File "/home/runner/work/zeit-on-tolino/zeit-on-tolino/sync.py", line 25, in
tolino.login_and_upload(webdriver, e_paper_path, e_paper_title)
File "/home/runner/work/zeit-on-tolino/zeit-on-tolino/zeit_on_tolino/tolino.py", line 169, in login_and_upload
_login(webdriver)
File "/home/runner/work/zeit-on-tolino/zeit-on-tolino/zeit_on_tolino/tolino.py", line 88, in _login
WebDriverWait(webdriver, Delay.medium).until(EC.presence_of_element_located((shop.user.by, shop.user.value)))
File "/home/runner/.cache/pypoetry/virtualenvs/zeit-on-tolino-z2YdorRC-py3.10/lib/python3.10/site-packages/selenium/webdriver/support/wait.py", line 105, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
Stacktrace:
0 0x5644d2ac9863
1 0x5644d27bf8c6
2 0x5644d280a618
3 0x5644d280a6d1
4 0x5644d284d744
5 0x5644d282c5cd
6 0x5644d284ac19
7 0x5644d282c343
8 0x5644d27fd593
9 0x5644d27fdf5e
10 0x5644d2a8d84b
11 0x5644d2a917a5
12 0x5644d2a7b571
13 0x5644d2a92332
14 0x5644d2a6087f
15 0x5644d2ab8728
16 0x5644d2ab88fb
17 0x5644d2ac89b4
18 0x7fb1fa094ac3
Error: Process completed with exit code 1.`