SeleniumHQ / selenium

A browser automation framework and ecosystem.
https://selenium.dev
Apache License 2.0
29.77k stars 8.02k forks source link

[๐Ÿš€ Feature]: Download to GCS bucket #14074

Closed rohan472000 closed 4 weeks ago

rohan472000 commented 1 month ago

Feature and motivation

While scrapping a dynamic website, I wanted to download to my google cloud bucket instead of local directory, as I need to move it to gcs bucket later on to process further, to avoid the time lapse in transferring the files, when I searched how to download file directly to gcs bucket (while scrapping through selenium), I got that - Selenium doesn't support direct downloads to a cloud storage path.

Usage example

Imagine you are running a web scraper to download a large number of files from a dynamic website. These files need to be processed further using cloud-based tools and services hosted on Google Cloud Platform (GCP).

Currently, the workflow involves downloading files to a local directory and then uploading them to a Google Cloud Storage (GCS) bucket. This two-step process introduces delays and increases complexity, especially if the local storage is limited or if the files are large.

With the proposed feature of direct downloads to a GCS bucket, you can streamline this workflow:

  1. Set up your web scraper using Selenium.
  2. Configure the download destination to point directly to your GCS bucket.
  3. Run your scraper. Files will be downloaded directly to the GCS bucket, eliminating the need for local storage.
  4. Process the files immediately using cloud-based tools and services available on GCP.

This feature would save time, reduce the need for intermediate local storage, and simplify the overall data processing pipeline. It would be particularly beneficial for scenarios involving large datasets, limited local storage, or high-frequency scraping tasks where quick processing is essential.

github-actions[bot] commented 1 month ago

@rohan472000, thank you for creating this issue. We will troubleshoot it as soon as we can.


Info for maintainers

Triage this issue by using labels.

If information is missing, add a helpful comment and then I-issue-template label.

If the issue is a question, add the I-question label.

If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.

If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C), add the applicable G-* label, and it will provide the correct link and auto-close the issue.

After troubleshooting the issue, please add the R-awaiting answer label.

Thank you!

titusfortner commented 4 weeks ago

Selenium is designed to allow you to do in code what you can do manually. How would you save files remotely when you are using the browser? Perhaps there is an extension you can install to do what you want? I'll add some links for where you can ask more questions.

github-actions[bot] commented 4 weeks ago

๐Ÿ’ฌ Please ask questions at: