INeed4000Bucks / cloudscraper

A Python module to bypass Cloudflare's anti-bot page.
MIT License
17 stars 2 forks source link

VR videos aren't found #2

Open jwvanderbeck opened 1 month ago

jwvanderbeck commented 1 month ago

Summary.

Expected Result

VR Videos download like any other

Actual Result

VR videos aren't found, because the tag structure is different


I don't know if you still mess with this project, but wanted to at least let you know that when trying to download a video that SB plays in its VR Player, the page structure is different so it doesn't find the video.

I'm attempting to fix this on my side, and will post a PR if I do but obviously I'm not as familiar with your code :)

jwvanderbeck commented 1 month ago

So I got this working but it is pretty hacky and I had to skip the cover image download if it detects VR video because I couldn't figure that part out and I don't need the image anyway.

Still happy to make a PR but you probably wouldn't want it as is.

The main change is basically this:

    # Find the video tag with the id "main_video_player" and get the source tag within it
    vr = False
    video_tag = soup.find('video', {'id': 'main_video_player'})
    if not video_tag:
        video_tag = soup.find(id="vr_player")
        if video_tag:
            vr = True

Then I use the vr bool at the image download to just skikp that whole section if true:


    # find and dl image
    if not vr:
        img_tag = soup.find('div', class_='play_cover').find('img')
        img_url = img_tag['src'].replace('w:300', 'w:1600')
        img_response = scraper.get(img_url)

        # Check if the request was successful
        if img_response.status_code == 200:
            in_counter = 1 #image number counter

            base_image_filename = f'{uploader_name} - {image_title_text}'

            image_filename = f'{base_image_filename}.jpg'
            # Check if the file already exists and increment
            while os.path.exists(image_filename):
                image_filename = f'{base_image_filename} ({in_counter}).jpg'
                in_counter += 1
            with open(image_filename, 'wb') as f:
                f.write(img_response.content)
                print("Image downloaded successfully.")
        else:
            print("Failed to download the image.")