GaParmar / img2img-turbo

One-step image-to-image with Stable Diffusion turbo: sketch2image, day2night, and more
MIT License
1.48k stars 168 forks source link

The way to fix The URL in Mid-Journey Datasets said: 'This content is no longer' #27

Open aihacker111 opened 5 months ago

aihacker111 commented 5 months ago

Step 1: Go to the Chrome browsers Step 2: Add this extension from Chrome: Fix Discord CDN Step 3: Try on access the URL and see the image

Screenshot 2024-04-09 at 17 15 34

The code for crawl Image data from multiple links:

from selenium import webdriver
import pandas as pd
import os
import requests

# Function to save image from a URL
def save_image_from_url(driver, url, idx):
    try:
        driver.get(url)
        current_url = driver.current_url
        # Find the image element on the page
        # Get the source URL of the image
        # Download the image
        image_name = f'image_{idx}.png'
        image_path = os.path.join('image_downloads', image_name)
        with open(image_path, 'wb') as f:
            f.write(requests.get(current_url).content)
        print(f"Image {idx} downloaded successfully.")
    except Exception as e:
        print(f"Error downloading image {idx}: {e}")

# Step 1: Read the CSV file
csv_file_path = 'b.csv'  # Change this to the path of your CSV file
df = pd.read_csv(csv_file_path)

# Step 2: Extract image URLs
image_column_name = 'Attachments'  # Change this to the name of the column containing image URLs
image_urls = df[image_column_name]

# Set up Chrome WebDriver with user profile
FILE_NAME_PROFILE = 'open -a /Applications/Google\ Chrome.app %s'
options = webdriver.ChromeOptions()
options.add_argument('--user-data-dir=' + FILE_NAME_PROFILE)
options.add_extension('/Users/macbook/Downloads/Segment-Anymate/data/gdljcbcihhoampfcfeokidlbanblkgbg.crx')  # Replace with the path to your extension CRX file
driver = webdriver.Chrome(options=options)

# Create a directory to save images
os.makedirs('image_downloads', exist_ok=True)

# Loop through each URL and save the image
for idx, url in enumerate(image_urls):
    save_image_from_url(driver, url, idx)

# Quit the WebDriver
driver.quit()

You must download the CRM file of Fix Discord CDN extension and put to the same directory with your main code

GaParmar commented 5 months ago

Thank you for documenting this!