DIGITALCRIMINAL / ArchivedUltimaScraper

Scrape content from OnlyFans and Fansly
GNU General Public License v3.0
943 stars 39 forks source link

Incomplete scrape of paid content #353

Closed melbandera closed 1 year ago

melbandera commented 2 years ago

Hello y'all. Saw a couple of similar closed issues but didn't notice any solutions. I have purchased some paid message content that I am unable to scrape. The scraper will grab some of this content but not all of it. It fails to download the same set of videos with repeated tries. Happens with the most recent pull. I am not certain if partial scraping is limited to this single profile, but I can say I am completely unable to do is scrape content of models with lapsed subscriptions, even if I purchased PPV.

{
  "info": {
    "version": 8.0
  },
  "settings": {
    "auto_site_choice": "",
    "export_type": "json",
    "profile_directories": [
      ".profiles"
    ],
    "max_threads": -1,
    "min_drive_space": 0,
    "helpers": {
      "renamer": true,
      "reformat_media": true,
      "downloader": true,
      "delete_empty_directories": false
    },
    "webhooks": {
      "global_webhooks": [],
      "global_status": true,
      "auth_webhook": {
        "succeeded": {
          "webhooks": [],
          "status": null,
          "hide_sensitive_info": true
        },
        "failed": {
          "webhooks": [],
          "status": null,
          "hide_sensitive_info": true
        }
      },
      "download_webhook": {
        "succeeded": {
          "webhooks": [],
          "status": null,
          "hide_sensitive_info": true
        },
        "failed": {
          "webhooks": [],
          "status": null,
          "hide_sensitive_info": true
        }
      }
    },
    "exit_on_completion": false,
    "infinite_loop": true,
    "loop_timeout": 0,
    "dynamic_rules_link": "https://raw.githubusercontent.com/DATAHOARDERS/dynamic-rules/main/onlyfans.json",
    "proxies": [],
    "cert": "",
    "random_string": "64785a5473b311ec855e8a094eea7fb1"
  },
  "supported": {
    "onlyfans": {
      "settings": {
        "auto_profile_choice": [],
        "auto_model_choice": false,
        "auto_media_choice": "",
        "auto_api_choice": true,
        "browser": {
          "auth": true
        },
        "jobs": {
          "scrape": {
            "subscriptions": true,
            "paid_content": true
          },
          "metadata": {
            "posts": true,
            "comments": true
          }
        },
        "download_directories": [
          ".sites"
        ],
        "file_directory_format": "{site_name}/{model_username}/{api_type}/{value}/{media_type}",
        "filename_format": "{filename}.{ext}",
        "metadata_directories": [
          ".sites"
        ],
        "metadata_directory_format": "{site_name}/{model_username}/Metadata",
        "delete_legacy_metadata": false,
        "text_length": 255,
        "video_quality": "source",
        "overwrite_files": false,
        "date_format": "%d-%m-%Y",
        "ignored_keywords": [],
        "ignore_type": "",
        "blacklists": [],
        "webhook": true
      }
    },
    "fansly": {
      "settings": {
        "auto_profile_choice": [],
        "auto_model_choice": false,
        "auto_media_choice": "",
        "auto_api_choice": true,
        "browser": {
          "auth": true
        },
        "jobs": {
          "scrape": {
            "subscriptions": true,
            "paid_content": true
          },
          "metadata": {
            "posts": true,
            "comments": true
          }
        },
        "download_directories": [
          ".sites"
        ],
        "file_directory_format": "{site_name}/{model_username}/{api_type}/{value}/{media_type}",
        "filename_format": "{filename}.{ext}",
        "metadata_directories": [
          ".sites"
        ],
        "metadata_directory_format": "{site_name}/{model_username}/Metadata",
        "delete_legacy_metadata": false,
        "text_length": 255,
        "video_quality": "source",
        "overwrite_files": false,
        "date_format": "%d-%m-%Y",
        "ignored_keywords": [],
        "ignore_type": "",
        "blacklists": [],
        "webhook": true
      }
    },
    "starsavn": {
      "settings": {
        "auto_profile_choice": [],
        "auto_model_choice": false,
        "auto_media_choice": "",
        "auto_api_choice": true,
        "browser": {
          "auth": true
        },
        "jobs": {
          "scrape": {
            "subscriptions": true,
            "paid_content": true
          },
          "metadata": {
            "posts": true,
            "comments": true
          }
        },
        "download_directories": [
          ".sites"
        ],
        "file_directory_format": "{site_name}/{model_username}/{api_type}/{value}/{media_type}",
        "filename_format": "{filename}.{ext}",
        "metadata_directories": [
          ".sites"
        ],
        "metadata_directory_format": "{site_name}/{model_username}/Metadata",
        "delete_legacy_metadata": false,
        "text_length": 255,
        "video_quality": "source",
        "overwrite_files": false,
        "date_format": "%d-%m-%Y",
        "ignored_keywords": [],
        "ignore_type": "",
        "blacklists": [],
        "webhook": true
      }
    }
  }
}
JohnnyTowns94 commented 2 years ago

I think something has changed recently because I have noticed that the scraper is processing a lot less paid content than usual.

aizenn1 commented 2 years ago

I have the same problem, I'm trying to scrape a model content with 200+ videos but I only getting 22. I have tried with 2 other models and the same problem not downloading the whole content just a few of it! Btw the models posting non PPV vids so the vids not locked or something, I think onlyfans changed the api or something! Check this screenshot: https://imgur.com/FY6xoE0 as you can see the process getting stopped on 25%

HistoricTheater commented 2 years ago

I'm not having this issue however i notice your config is missing a few options that i have filled in, and im getting full rips

       "settings": {
        "auto_profile_choice": [],
        "auto_model_choice": false,
        "auto_media_choice": "",
        "auto_api_choice": true,

Where i have

      "settings": {
        "auto_profile_choice": "default",
        "auto_model_choice": true,
        "auto_media_choice": true,
        "auto_api_choice": true,

I suggest filling those in so it automatically runs everything and checking again,this may or may not change anything but its worth trying

artyom2035 commented 2 years ago

In my case, one thing could pinpoint is that it seems if paid content has a preview video 'slide' whatever it's called won't download. Even occurred with a message that I unlocked yesterday. There are others instances for which I can't isolate a reason - maybe age? Scratch that, it grabbed this file after a few scrapes.

When my NAS crashed ~4 months ago the script only redownload the newest paid content (~5-6 months old) from currently subscribed models. It wouldn't redownload any paid from expired subs or older content from current subs. Tried lots of different commits/releases unsuccessfully. No setting in the config, i.e. 'paid_content', 'overwrite_files', auto choices, or deleting metadata etc. would trigger a redownload for me either.

I'd also been following these similar "paid content" issues hoping for a fix but they'd always get closed/resolved for the individual that opened it, so I figured the problem was with on my end.

Eventually gave up and ended up using a combination of jdownloader, IDM, and a browser extension to manually redownload everything from the purchased tab. Took a long time, the filenames/dates are jacked up, but at least I have em & now I back up my paid directories.

JohnnyTowns94 commented 2 years ago

It's appears it is only downloading content from current subscriptions. The scraper would usually download all paid content regardless.

DIGITALCRIMINAL commented 2 years ago

The only way I can fix this is to use someone's auth.json details.

artyom2035 commented 2 years ago

The only way I can fix this is to use someone's auth.json details.

@digitalcriminals You can have mine. How do I get it to you?

DIGITALCRIMINAL commented 2 years ago

The only way I can fix this is to use someone's auth.json details.

@digitalcriminals You can have mine. How do I get it to you?

My email is on my main account @digitalcriminal

artyom2035 commented 2 years ago

Sent it your way

melbandera commented 2 years ago

Hey there, sorry, was out of town. Tested the changes here but that had no impact. Still experiencing the same issue—an identical subset of content is being scraped every time. Lmk if y'all need anything more from me

DIGITALCRIMINAL commented 2 years ago

I'll be pushing the fix within 24hrs

DIGITALCRIMINAL commented 2 years ago

@artyom2035 Thanks for helping, the fix has been pushed.

https://github.com/DIGITALCRIMINALS/OnlyFans/commit/a922b75a0d78fbc946accf0f7e9efd6204a91db9