SeleniumHQ / selenium

A browser automation framework and ecosystem.
https://selenium.dev
Apache License 2.0
30.26k stars 8.15k forks source link

[🐛 Bug]: Fail to download PDF or zip file from remote to client on Remote webdriver #13956

Open 15975518086 opened 4 months ago

15975518086 commented 4 months ago

What happened?

error:

D:\Python\Python311\python.exe D:/OfflineaCare/ndb/program/test/test_oooooooo.py Traceback (most recent call last): File "D:\OfflineaCare\ndb\program\test\test_oooooooo.py", line 51, in driver.download_file(downloadable_file, target_directory) File "D:\Python\Python311\Lib\site-packages\selenium\webdriver\remote\webdriver.py", line 1155, in download_file zip_ref.extractall(target_directory) File "D:\Python\Python311\Lib\zipfile.py", line 1679, in extractall self._extract_member(zipinfo, path, pwd) File "D:\Python\Python311\Lib\zipfile.py", line 1734, in _extract_member shutil.copyfileobj(source, target) File "D:\Python\Python311\Lib\shutil.py", line 197, in copyfileobj buf = fsrc_read(length) ^^^^^^^^^^^^^^^^^ File "D:\Python\Python311\Lib\zipfile.py", line 953, in read data = self._read1(n) ^^^^^^^^^^^^^^ File "D:\Python\Python311\Lib\zipfile.py", line 1021, in _read1 data += self._read2(n - len(data)) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Python\Python311\Lib\zipfile.py", line 1056, in _read2 raise EOFError EOFError

Process finished with exit code 1

How can we reproduce the issue?

The code bellow is click the button,then download the .docx file(or zip or pdf)
code:
from selenium import webdriver
from selenium.webdriver.common.by import By
import time

options = webdriver.ChromeOptions()
options.enable_downloads = True
driver = webdriver.Remote(command_executor='http://192.168.3.35:4444/wd/hub', options=options)
driver.maximize_window()
driver.implicitly_wait(5)
driver.get("http://127.0.0.1:8000/login_page")
driver.find_element(By.XPATH,"//button[text()='导出']").click()
time.sleep(5)
file_names = driver.get_downloadable_files()
downloadable_file = file_names[0]
target_directory = r'D:\dtmp'
driver.download_file(downloadable_file, target_directory)
time.sleep(10)

node setting:
java -jar selenium-server-4.20.0.jar node --hub http://192.168.3.35:4444   --host 192.168.3.35 --port 5557  --enable-managed-downloads true

I found the the source code in webdriver.py the method :def get_downloadable_files, has some issues
if i set the name to be zip like 'file_name = 'package.zip' ,then i can run successfully, but without this ,it will fail

        contents = self.execute(Command.DOWNLOAD_FILE, {"name": file_name})["value"]["contents"]
        # file_name = 'package.zip'
        target_file = os.path.join(target_directory, file_name)
        with open(target_file, "wb") as file:
            file.write(base64.b64decode(contents))

        with zipfile.ZipFile(target_file, "r") as zip_ref:
            zip_ref.extractall(target_directory)

Relevant log output

D:\Python\Python311\python.exe D:/OfflineaCare/ndb/program/test/test_oooooooo.py
Traceback (most recent call last):
  File "D:\OfflineaCare\ndb\program\test\test_oooooooo.py", line 51, in <module>
    driver.download_file(downloadable_file, target_directory)
  File "D:\Python\Python311\Lib\site-packages\selenium\webdriver\remote\webdriver.py", line 1155, in download_file
    zip_ref.extractall(target_directory)
  File "D:\Python\Python311\Lib\zipfile.py", line 1679, in extractall
    self._extract_member(zipinfo, path, pwd)
  File "D:\Python\Python311\Lib\zipfile.py", line 1734, in _extract_member
    shutil.copyfileobj(source, target)
  File "D:\Python\Python311\Lib\shutil.py", line 197, in copyfileobj
    buf = fsrc_read(length)
          ^^^^^^^^^^^^^^^^^
  File "D:\Python\Python311\Lib\zipfile.py", line 953, in read
    data = self._read1(n)
           ^^^^^^^^^^^^^^
  File "D:\Python\Python311\Lib\zipfile.py", line 1021, in _read1
    data += self._read2(n - len(data))
            ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Python\Python311\Lib\zipfile.py", line 1056, in _read2
    raise EOFError
EOFError

Process finished with exit code 1

Operating System

WINDOWS10

Selenium version

selenium 4.20.0 python 3.11.3

What are the browser(s) and version(s) where you see this issue?

Chrome 124

What are the browser driver(s) and version(s) where you see this issue?

124.0.6367.61

Are you using Selenium Grid?

selenium-server-4.20.0.jar

github-actions[bot] commented 4 months ago

@15975518086, thank you for creating this issue. We will troubleshoot it as soon as we can.


Info for maintainers

Triage this issue by using labels.

If information is missing, add a helpful comment and then I-issue-template label.

If the issue is a question, add the I-question label.

If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.

If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C), add the applicable G-* label, and it will provide the correct link and auto-close the issue.

After troubleshooting the issue, please add the R-awaiting answer label.

Thank you!

M1troll commented 3 months ago

Hi!

I encountered the same problem when trying to download a zip file.

Also in the process of debugging I catch another error message here (maybe it help: image

Operating System: Manjaro Linux Selenium version: 4.21 Python version: 3.12 Browsers: Chrome , Firefox, Edge (latest versions of selenium/standalone)

Traceback:

 tests/modules/test_internal_export.py:104: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
.venv/lib/python3.12/site-packages/selenium/webdriver/remote/webdriver.py:1155: in download_file
    zip_ref.extractall(target_directory)
../../../.pyenv/versions/3.12.0/lib/python3.12/zipfile/__init__.py:1720: in extractall
    self._extract_member(zipinfo, path, pwd)
../../../.pyenv/versions/3.12.0/lib/python3.12/zipfile/__init__.py:1778: in _extract_member
    shutil.copyfileobj(source, target)
../../../.pyenv/versions/3.12.0/lib/python3.12/shutil.py:203: in copyfileobj
    while buf := fsrc_read(length):
../../../.pyenv/versions/3.12.0/lib/python3.12/zipfile/__init__.py:978: in read
    data = self._read1(n)
../../../.pyenv/versions/3.12.0/lib/python3.12/zipfile/__init__.py:1046: in _read1
    data += self._read2(n - len(data))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <zipfile.ZipExtFile [closed]>, n = 3094

    def _read2(self, n):
        if self._compress_left <= 0:
            return b''

        n = max(n, self.MIN_READ_SIZE)
        n = min(n, self._compress_left)

        data = self._fileobj.read(n)
        self._compress_left -= len(data)
        if not data:
>           raise EOFError
E           EOFError

../../../.pyenv/versions/3.12.0/lib/python3.12/zipfile/__init__.py:1081: EOFError

Docker-compose file

 version: '3'

services:
  chrome:
    image: selenium/standalone-chrome
    shm_size: 2gb
    ports:
      - 4444:4444  # Selenium service
      - 5900:5900  # VNC server
      - 7900:7900  # VNC browser client
    environment:
      - SE_OPTS=--enable-managed-downloads true
mormamn commented 3 months ago

We are also experiencing the same issue... The root issue, is that it's writing the zip-file content with the same name of the desired file, when it starts to uncompress, the "zip" file get's overwritten and then the file goes empty resulting with the EOF exception

ATM we are bypassing it by calling the self.execute directly with a similar solution to what millin did in his PR

    def __download_file(self, file_name: str, target_directory: str) -> None:
        if not os.path.exists(target_directory):
            os.makedirs(target_directory)

        contents = self.execute(Command.DOWNLOAD_FILE, {"name": file_name})["value"]["contents"]

        zip_target_file = os.path.join(target_directory, f"{file_name}.zip")
        with open(zip_target_file, "wb") as file:
            file.write(base64.b64decode(contents))

        with zipfile.ZipFile(zip_target_file, "r") as zip_ref:
            zip_ref.extractall(target_directory)
        os.remove(zip_target_file)
github-actions[bot] commented 2 months ago

This issue is looking for contributors.

Please comment below or reach out to us through our IRC/Slack/Matrix channels if you are interested.

millin commented 4 days ago

@titusfortner Fixed in #14031