fgebhart / zeit-on-tolino

Service to auto-upload the ZEIT 🗞 e-paper to your tolino cloud library 📚
MIT License
14 stars 23 forks source link

Login to tolino cloud fails #52

Open superhenne opened 5 months ago

superhenne commented 5 months ago

When trying to log in to tolino cloud, the script raises a Timeout error. I made sure the credentials are ok, i also pulled a fresh copy of the repo. Please error log below, what else can i check? Shop is Thalia.

`INFO:main:logging into ZEIT premium... INFO:main:downloading most recent ZEIT e-paper... INFO:zeit_on_tolino.zeit:clicking download button now... INFO:main:successfully finished download of 'DIE ZEIT - Nr. 17, 18.04.2024' INFO:main:upload ZEIT e-paper to tolino cloud... INFO:zeit_on_tolino.tolino:logging into tolino cloud... Traceback (most recent call last): File "/home/runner/work/zeit-on-tolino/zeit-on-tolino/sync.py", line 25, in tolino.login_and_upload(webdriver, e_paper_path, e_paper_title) File "/home/runner/work/zeit-on-tolino/zeit-on-tolino/zeit_on_tolino/tolino.py", line 169, in login_and_upload _login(webdriver) File "/home/runner/work/zeit-on-tolino/zeit-on-tolino/zeit_on_tolino/tolino.py", line 88, in _login WebDriverWait(webdriver, Delay.medium).until(EC.presence_of_element_located((shop.user.by, shop.user.value))) File "/home/runner/.cache/pypoetry/virtualenvs/zeit-on-tolino-z2YdorRC-py3.10/lib/python3.10/site-packages/selenium/webdriver/support/wait.py", line 105, in until raise TimeoutException(message, screen, stacktrace) selenium.common.exceptions.TimeoutException: Message: Stacktrace:

0 0x5644d2ac9863

1 0x5644d27bf8c6

2 0x5644d280a618

3 0x5644d280a6d1

4 0x5644d284d744

5 0x5644d282c5cd

6 0x5644d284ac19

7 0x5644d282c343

8 0x5644d27fd593

9 0x5644d27fdf5e

10 0x5644d2a8d84b

11 0x5644d2a917a5

12 0x5644d2a7b571

13 0x5644d2a92332

14 0x5644d2a6087f

15 0x5644d2ab8728

16 0x5644d2ab88fb

17 0x5644d2ac89b4

18 0x7fb1fa094ac3

Error: Process completed with exit code 1.`

fgebhart commented 4 months ago

Hi @superhenne - thanks for raising the issue.

I recall running into such problem some time back. However, the current state of the repo seems to be working for me. See actions. Could you verify if the issue still persists with the latest changes on main branch? Especially the change to chrome (instead of FireFox) has the potential to fix such an issue.

superhenne commented 4 months ago

Hi @fgebhart i was and I am using the exact code from the main branch with no changes, and the last run yesterday still failed. How can the same code work for you but not for me? Are you also using thalia? Anything else i can try?

fgebhart commented 4 months ago

I'm using Hugendubel. That could be the reason why. The login flow for the tolino webreader page is different for thalia. It could of course be that it changed slightly and thus the login flow breaks. I'm blind to this from my end as I cannot test all tolino partners as it would require accounts for all of them.

If you want to dig deeper, I would suggest to

  1. checkout the code locally
  2. disable the headless mode by commenting this line
  3. run the tests using pytest
  4. watch what the browser does
  5. add breakpoints e.g. before the line that fails
  6. check which elements might not be there or named as expected

Let me know if you need more input on that. Please understand that I do not have the time to debug this myself.

superhenne commented 4 months ago

I ran the tests locally with pytest. Everything worked ok, the test epub got uploaded to webreader. Now i tried to run sync.py local, but it errors on downloading the epaper from zeit, it seems to have an issue with the file not found after download. I also cannot find the path mentioned in the log

(zeit-on-tolino) xxxxxxxx@xxxxxxxx zeit-on-tolino % python sync.py INFO:__main__:logging into ZEIT premium... INFO:__main__:downloading most recent ZEIT e-paper... INFO:zeit_on_tolino.zeit:clicking download button now... Traceback (most recent call last): File "/Users/xxxxxxxx/Documents/GitHub/zeit-on-tolino/sync.py", line 18, in <module> e_paper_path = zeit.download_e_paper(webdriver) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/xxxxxxxx/Documents/GitHub/zeit-on-tolino/zeit_on_tolino/zeit.py", line 94, in download_e_paper wait_for_downloads(webdriver.download_dir_path) File "/Users/xxxxxxxx/Documents/GitHub/zeit-on-tolino/zeit_on_tolino/zeit.py", line 66, in wait_for_downloads while any([filename.endswith(".crdownload") for filename in os.listdir(path)]): ^^^^^^^^^^^^^^^^ FileNotFoundError: [Errno 2] No such file or directory: '/var/folders/v6/2y8jhzts27n4ptr0msbhh_yc0000gn/T/tmp474r_1lx' (zeit-on-tolino) xxxxxxxx@xxxxxxxx zeit-on-tolino %

fgebhart commented 4 months ago

Seems hard to debug from my end. Also its not clear to me, why it works on github actions for me and locally for you, but not in github actions for you :man_shrugging:

The error reminds me of the recent migration from geckodriver (firefox) to chromedriver, where downloading files is handled differently.

As a last resort you could consider forking the repo again into a new repo on your side.

superhenne commented 4 months ago

an update: I got it working, by adding a user agent string in the chrome options. As far as i understood running in headless mode adds a headless keyword in the user agent which is blocked by some web sites.

in web.py options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.84 Safari/537.36")

The issue with failing locally could be solved by changing the download path to an existing real path: DOWNLOAD_PATH = '/some/existing/path'

superhenne commented 4 months ago

The headless user agent triggers the cloudflare "confirm you are a human" screen for thalia.de

fgebhart commented 4 months ago

Hi @superhenne - sorry I'm a bit busy these days.

So to summarize, we see/saw two issues:

  1. Failing locally, because of faulty configured path. Do you see an option to change the existing code (in fgebhart/zeit-on-tolino) so it could potentially work for all users? If yes, I'm happy to review a PR for it.
  2. Running into the "confirm you are a human" issue. Does this happen locally as well or in Github Actions? Do you see a need for a PR? For me things seem still to be working. However, it is in my interest that the repo works for as many users as possible.
superhenne commented 4 months ago
  1. On Mac, I used a static local path, that will not work for other users. Unfortunately i don´t know how to handle this universally. Or leave it as configuration option.
  2. The cloudflare human check happens with thalia.de, when the user agent contains "headless", which is the default for selenium chrome driver. This can easily be overriden with this line in web.py, def get_webdriver: options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.84 Safari/537.36")
smarthomeagentur commented 3 months ago
  1. On Mac, I used a static local path, that will not work for other users. Unfortunately i don´t know how to handle this universally. Or leave it as configuration option.
  2. The cloudflare human check happens with thalia.de, when the user agent contains "headless", which is the default for selenium chrome driver. This can easily be overriden with this line in web.py, def get_webdriver: options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.84 Safari/537.36")

Just want to confirm the add_argument fixed the problem with Thalia for me. The periodic sync does work with this. I can't rate the other side effects so I will not make a pull request for it.

Best regards