UltimaHoarder / UltimaScraper

Scrape all the media from an OnlyFans account - Updated regularly
GNU General Public License v3.0
3.98k stars 607 forks source link

Refresh page #1119

Closed CapitanLiteral closed 3 years ago

CapitanLiteral commented 3 years ago

I fill the auth.json but when I start the script it says that I should refresh the page. Does this happen to anyone else? Is this a bug or I'm doing something wrong?

DonaldTPP commented 3 years ago

I fill the auth.json but when I start the script it says that I should refresh the page. Does this happen to anyone else? Is this a bug or I'm doing something wrong?

Just started doing it for me again also. Hold tight, it will get looked at.

sonicshake commented 3 years ago

The fourth part of the sign that's changing - previously 608c48da and now 608fcf2c - this looks like a hexidecimal version of an epoch timestamp in seconds. I don't have the previous vendor.js to look back at, but this is what I've observed from the new file:

  1. 608fcf2c converted to decimal and then analyzed as a timestamp is 2021-05-03 10:23:40 UTC
  2. The vendor.js full file name from Chrome developer tools is vendor.js?rev=202105031022-6ba1f1c47b

It looks like the revision number and the hex code that is being used are similar but off by a minute, so maybe this is just a coincidence. Just an observation.

I was going to say if 4: is the day and :608 is the revision then it might be dynamically generated?

The first number so far does seem to be incrementing, but is it by day? It seems the 3: lasted both Saturday and Sunday.

Yeah, but on the 3rd they changed the number to 4 - maybe today they'll change it to 5.

This just happened, lol

oftrash commented 3 years ago

@DIGITALCRIMINAL, @hippothon we back at it again.

4 = 5 608fcf2c = 6091065f yoCCrPVmrN27vlUEQLZcW3DZH97KRVoy = Sx7FEcC7r5uKuCIzVljwS8gnZGhNprM5

need a new checksum :(

DonaldTPP commented 3 years ago

@DIGITALCRIMINAL, @hippothon we back at it again.

4 = 5 608fcf2c = 6091065f yoCCrPVmrN27vlUEQLZcW3DZH97KRVoy = Sx7FEcC7r5uKuCIzVljwS8gnZGhNprM5

need a new checksum :(

You called it brother.

UltimaHoarder commented 3 years ago

Well at least we know it's automatically updated every 24 hours and not some guy on a computer updating it manually lmao.

idiotabroad commented 3 years ago

Yep, it went again

I think they are detcting use of this tool somehow

DonaldTPP commented 3 years ago

Yep, it went again

I think they are detcting use of this tool somehow

They aren't detecting that people are using it, they are probably aware of this page and have their own copy of the scraper and are testing it to see what breaks it, how it gets fixed, how quick it gets fixed etc etc.

MrCheeseLol commented 3 years ago

So is it borked for the time being? I've tried the updated login thingo (I'm real good with the technical lingo), it eventually lets me log in through the geckodriver browser but I then just get:

Scraping Paid Content Scraping Subscriptions There's nothing to scrape. Archive Completed in 2.23 Minutes

DonaldTPP commented 3 years ago

So is it borked for the time being? I've tried the updated login thingo (I'm real good with the technical lingo), it eventually lets me log in through the geckodriver browser but I then just get:

Scraping Paid Content Scraping Subscriptions There's nothing to scrape. Archive Completed in 2.23 Minutes

Try setting your max_threads to 1 and not -1

MrCheeseLol commented 3 years ago

Already done and no dice. Patience is a virtue I guess, and I am just a grateful patient pleb.

dearcoding commented 3 years ago

I have an idea, what if we run selenium wire just to get the correct header?

I mean for the generation of sign we use selenium, with selenium wire all is easy interceptable.

I think this can be done, but on what parameters depends sign?

DonaldTPP commented 3 years ago

I have an idea, what if we run selenium wire just to get the correct header?

I mean for the generation of sign we use selenium, with selenium wire all is easy interceptable.

I think this can be done, but on what parameters depends sign?

Sounds like a good idea but the problem is for the people who aren't tech savvy and can't get all these things going. Just a thought you know?

ofext commented 3 years ago

I have an idea, what if we run selenium wire just to get the correct header? I mean for the generation of sign we use selenium, with selenium wire all is easy interceptable. I think this can be done, but on what parameters depends sign?

Sounds like a good idea but the problem is for the people who aren't tech savvy and can't get all these things going. Just a thought you know?

Forgive my ignorance, but how would selenium help in getting the header for the request? It is computed by OnlyFans site itself and changing everyday. How would we extract the computing function from their source code just by using selenium?

dearcoding commented 3 years ago

I have an idea, what if we run selenium wire just to get the correct header? I mean for the generation of sign we use selenium, with selenium wire all is easy interceptable. I think this can be done, but on what parameters depends sign?

Sounds like a good idea but the problem is for the people who aren't tech savvy and can't get all these things going. Just a thought you know?

Forgive my ignorance, but how would selenium help in getting the header for the request? It is computed by OnlyFans site itself and changing everyday. How would we extract the computing function from their source code just by using selenium?

Well, the sign is generated for every request, for example it is generated also for the login page.

It is based on time (not a problem), other cache variables like auth-id and xbc, which we can edit before making the JavaScript calculate the right headers, the only problem could be if the url is used too in the calculation of the sign.

So we open our selenium sessions on only fans, we edit our cache -> we refresh page, before request is sent we get the sign value, we then abort the request.

Now we have the sign to be used in our requests.

In this way we could have always the updated version of the sign since what we are doing is just using the last JavaScript from only fans servers.

I will try to make it and will let you know where I break, but this seems the only way to have always updated headers without moving all on selenium.

ofext commented 3 years ago

@dearcoding the url path is used in calculating the sha1 hash, and the hash is then used to compute the third parameter of the sign, so it’s not a value you can get once and then use it multiple times...

dearcoding commented 3 years ago

@dearcoding the url path is used in calculating the sha1 hash, and the hash is then used to compute the third parameter of the sign, so it’s not a value you can get once and then use it multiple times...

Yeah this means if we find the way to manipulate the url used in the calculation the job is done.

ofext commented 3 years ago

@dearcoding the url path is used in calculating the sha1 hash, and the hash is then used to compute the third parameter of the sign, so it’s not a value you can get once and then use it multiple times...

Yeah this means if we find the way to manipulate the url used in the calculation the job is done.

The url used is not the browser’s url, it’s a url handled internally to query the api depending on the content currently being displayed on the site and the new content being required by the user while scrolling down a page...

Not saying it can’t be done, just saying it would be kind of messy and error prone

Macmasteri commented 3 years ago

Got this error too. Sadly I was busy working when the script worked.

dearcoding commented 3 years ago

@dearcoding the url path is used in calculating the sha1 hash, and the hash is then used to compute the third parameter of the sign, so it’s not a value you can get once and then use it multiple times...

Yeah this means if we find the way to manipulate the url used in the calculation the job is done.

The url used is not the browser’s url, it’s a url handled internally to query the api depending on the content currently being displayed on the site and the new content being required by the user while scrolling down a page...

Not saying it can’t be done, just saying it would be kind of messy and error prone

I don't know, i't just a frontend, usually this kind of controls are handled by backend.

dukedward commented 3 years ago

Well at least we know it's automatically updated every 24 hours and not some guy on a computer updating it manually lmao.

so correct me if I'm wrong @DIGITALCRIMINAL, @hippothon & @trevdilley but this appears to be what our daily challenge will be in figuring out the sign:

sign example - 5:########################################:932:6091065f

5: <--- appears to increment daily ########################################: <--- created from combo of random daily static string/ salt, epoch timestamp, api path, userId all separated with "\n" and converted to sha1 932: <---- sha1 checksum 6091065f <--- hex conversion of vendor.js revision epoch timestamp

I wish I could be more help but I'm not very good at JS and def don't know how to reverse engineer it just doing my part to help

ofext commented 3 years ago

For the guy who constantly reduces the function, here it is the new one isolated, it computes the third parameter of the sign based on the SHA1 hash:

https://jsfiddle.net/85fyu9wc/

New "constants":

var str1 = "Sx7FEcC7r5uKuCIzVljwS8gnZGhNprM5"; var str2 = "6091065f"; var constNumber = 5;

hippothon commented 3 years ago

I have doubts that their change is automated.

Yesterday it stopped accepting the old signature around 17:00 UTC, with the js being updated around 10:22 UTC (from filename). Today it stopped accepting the old signature around 09:00 UTC, with the js being updated around 08:30 UTC.

The discrepancy in client-side and server-side changes and the difference in times on different days tells me some guy was tasked with watching this github and trying to stay ahead of it.

Anyway the changes are

    static_param = "Sx7FEcC7r5uKuCIzVljwS8gnZGhNprM5"
    checksum = sum([sha_1_b[15], sha_1_b[37], sha_1_b[6], sha_1_b[9], sha_1_b[13], sha_1_b[34], sha_1_b[17], sha_1_b[14], sha_1_b[1], sha_1_b[37], sha_1_b[14], sha_1_b[18], sha_1_b[24], sha_1_b[28], sha_1_b[1], sha_1_b[31], 
                    sha_1_b[13], sha_1_b[14], sha_1_b[15], sha_1_b[19], sha_1_b[9], sha_1_b[29], sha_1_b[30], sha_1_b[23], 
                    sha_1_b[16], sha_1_b[13], sha_1_b[28], sha_1_b[35],
                    sha_1_b[15], sha_1_b[23], sha_1_b[28], sha_1_b[39]])-112
    headers["sign"] = "5:{}:{:x}:6091065f".format(
        sha_1_sign, abs(checksum))

Or for anyone using else js:

hash.charCodeAt(15) + 
hash.charCodeAt(37) + 
hash.charCodeAt(6) + 
hash.charCodeAt(9) + 
hash.charCodeAt(13) + 
hash.charCodeAt(34) + 
hash.charCodeAt(17) + 
hash.charCodeAt(14) + 
hash.charCodeAt(1) + 
hash.charCodeAt(37) + 
hash.charCodeAt(14) + 
hash.charCodeAt(18) + 
hash.charCodeAt(24) + 
hash.charCodeAt(28) + 
hash.charCodeAt(1) + 
hash.charCodeAt(31) + 
hash.charCodeAt(13) + 
hash.charCodeAt(14) + 
hash.charCodeAt(15) + 
hash.charCodeAt(19) + 
hash.charCodeAt(9) + 
hash.charCodeAt(29) + 
hash.charCodeAt(30) + 
hash.charCodeAt(23) + 
hash.charCodeAt(16) + 
hash.charCodeAt(13) + 
hash.charCodeAt(28) + 
hash.charCodeAt(35) + 
hash.charCodeAt(15) + 
hash.charCodeAt(23) + 
hash.charCodeAt(28) + 
hash.charCodeAt(39) +
-112
UltimaHoarder commented 3 years ago

I have doubts that their change is automated.

Yesterday it stopped accepting the old signature around 17:00 UTC, with the js being updated around 10:22 UTC (from filename). Today it stopped accepting the old signature around 09:00 UTC, with the js being updated around 08:30 UTC.

The discrepancy in client-side and server-side changes and the difference in times on different days tells me some guy was tasked with watching this github and trying to stay ahead of it.

Anyway the changes are

    static_param = "Sx7FEcC7r5uKuCIzVljwS8gnZGhNprM5"
    checksum = sum([sha_1_b[15], sha_1_b[37], sha_1_b[6], sha_1_b[9], sha_1_b[13], sha_1_b[34], sha_1_b[17], sha_1_b[14], sha_1_b[1], sha_1_b[37], sha_1_b[14], sha_1_b[18], sha_1_b[24], sha_1_b[28], sha_1_b[1], sha_1_b[31], 
                    sha_1_b[13], sha_1_b[14], sha_1_b[15], sha_1_b[19], sha_1_b[9], sha_1_b[29], sha_1_b[30], sha_1_b[23], 
                    sha_1_b[16], sha_1_b[13], sha_1_b[28], sha_1_b[35],
                    sha_1_b[15], sha_1_b[23], sha_1_b[28], sha_1_b[39]])-112
    headers["sign"] = "5:{}:{:x}:6091065f".format(
        sha_1_sign, abs(checksum))

Or for anyone using else js:

hash.charCodeAt(15) + 
hash.charCodeAt(37) + 
hash.charCodeAt(6) + 
hash.charCodeAt(9) + 
hash.charCodeAt(13) + 
hash.charCodeAt(34) + 
hash.charCodeAt(17) + 
hash.charCodeAt(14) + 
hash.charCodeAt(1) + 
hash.charCodeAt(37) + 
hash.charCodeAt(14) + 
hash.charCodeAt(18) + 
hash.charCodeAt(24) + 
hash.charCodeAt(28) + 
hash.charCodeAt(1) + 
hash.charCodeAt(31) + 
hash.charCodeAt(13) + 
hash.charCodeAt(14) + 
hash.charCodeAt(15) + 
hash.charCodeAt(19) + 
hash.charCodeAt(9) + 
hash.charCodeAt(29) + 
hash.charCodeAt(30) + 
hash.charCodeAt(23) + 
hash.charCodeAt(16) + 
hash.charCodeAt(13) + 
hash.charCodeAt(28) + 
hash.charCodeAt(35) + 
hash.charCodeAt(15) + 
hash.charCodeAt(23) + 
hash.charCodeAt(28) + 
hash.charCodeAt(39) +
-112

Ahh true, lmao. Since you're here, how is the -112 calculated?

ofext commented 3 years ago

I have doubts that their change is automated.

Yesterday it stopped accepting the old signature around 17:00 UTC, with the js being updated around 10:22 UTC (from filename). Today it stopped accepting the old signature around 09:00 UTC, with the js being updated around 08:30 UTC.

The discrepancy in client-side and server-side changes and the difference in times on different days tells me some guy was tasked with watching this github and trying to stay ahead of it.

If that's the case then we must change it thousands of times until they get tired of changing it. It's not healthy to constantly change a production site's frontend and backend for reasons not related to improving user's experience or provide new features. Unless it's autommatically generated, they must stop at some point

oftrash commented 3 years ago

yeah lets just wear this dude down :)

DonaldTPP commented 3 years ago

I love how everybody comes together to solve the problems.

MrCheeseLol commented 3 years ago

You guys are fucking Wizards.

hippothon commented 3 years ago

Ahh true, lmao. Since you're here, how is the -112 calculated?

The initial code after deobfuscation looks something like this:

hash.charCodeAt(15) - 142 + 
hash.charCodeAt(37) - 124 + 
hash.charCodeAt(6) - 147 + 
hash.charCodeAt(9) - 84 + 
hash.charCodeAt(13) + 83 + 
hash.charCodeAt(34) - 101 + 
hash.charCodeAt(17) - 76 + 
hash.charCodeAt(14) + 124 + 
hash.charCodeAt(1) + 107 + 
hash.charCodeAt(37) + 151 + 
hash.charCodeAt(14) - 147 + 
hash.charCodeAt(18) - 79 + 
hash.charCodeAt(24) + 90 + 
hash.charCodeAt(28) - 59 + 
hash.charCodeAt(1) + 121 + 
hash.charCodeAt(31) - 98 + 
hash.charCodeAt(13) + 119 + 
hash.charCodeAt(14) - 77 + 
hash.charCodeAt(15) - 84 + 
hash.charCodeAt(19) - 72 + 
hash.charCodeAt(9) + 139 + 
hash.charCodeAt(29) + 121 + 
hash.charCodeAt(30) - 79 + 
hash.charCodeAt(23) + 135 + 
hash.charCodeAt(16) - 83 + 
hash.charCodeAt(13) + 69 + 
hash.charCodeAt(28) - 83 + 
hash.charCodeAt(35) + 89 + 
hash.charCodeAt(15) - 98 + 
hash.charCodeAt(23) - 76 + 
hash.charCodeAt(28) + 148 + 
hash.charCodeAt(39) + 101

After I posted that the first time someone pointed out it makes more sense to simplify the math so you just add all of those numbers together.

DonaldTPP commented 3 years ago

Can verify it's all working again, except max_threads being anything other than "1"

benhacka commented 3 years ago

@hippothon

    static_param = "Sx7FEcC7r5uKuCIzVljwS8gnZGhNprM5"
    checksum = sum([sha_1_b[15], sha_1_b[37], sha_1_b[6], sha_1_b[9], sha_1_b[13], sha_1_b[34], sha_1_b[17], sha_1_b[14], sha_1_b[1], sha_1_b[37], sha_1_b[14], sha_1_b[18], sha_1_b[24], sha_1_b[28], sha_1_b[1], sha_1_b[31], 
                    sha_1_b[13], sha_1_b[14], sha_1_b[15], sha_1_b[19], sha_1_b[9], sha_1_b[29], sha_1_b[30], sha_1_b[23], 
                    sha_1_b[16], sha_1_b[13], sha_1_b[28], sha_1_b[35],
                    sha_1_b[15], sha_1_b[23], sha_1_b[28], sha_1_b[39]])-112
    headers["sign"] = "5:{}:{:x}:6091065f".format(
        sha_1_sign, abs(checksum))

Or for anyone using else js:

hash.charCodeAt(15) + 
...
hash.charCodeAt(39) +
-112

Awesome. But where did you get salt, and sha1 bytes order? Is it possible to parser from JS path/text (with regular exp) with static requests?

oftrash commented 3 years ago

@DIGITALCRIMINAL might as well keep this issue open. talk to you guys in a day....

dukedward commented 3 years ago

@DIGITALCRIMINAL might as well keep this issue open. talk to you guys in a day....

I was just thinking the same thing... see you guys again tomorrow

DonaldTPP commented 3 years ago

@DIGITALCRIMINAL might as well keep this issue open. talk to you guys in a day....

Take it easy, Pal.

8steelbeans commented 3 years ago

Can verify it's all working again, except max_threads being anything other than "1"

Hey there. Long time listener, first time caller. I'm confused. I'm still getting the same error as an hour ago.. it's working on your end?

DonaldTPP commented 3 years ago

Can verify it's all working again, except max_threads being anything other than "1"

Hey there. Long time listener, first time caller. I'm confused. I'm still getting the same error as an hour ago.. it's working on your end?

Use the latest commit.

8steelbeans commented 3 years ago

Can verify it's all working again, except max_threads being anything other than "1"

Hey there. Long time listener, first time caller. I'm confused. I'm still getting the same error as an hour ago.. it's working on your end?

Use the latest commit.

Ahh... of course. Thank you

greatmate98 commented 3 years ago

Got it back working. But its only showing 12 of the 100+ subscriptions i have. And after each scrape, its hanging on the downloading messages part.

wstan1 commented 3 years ago

Can verify it's all working again, except max_threads being anything other than "1"

Hey there. Long time listener, first time caller. I'm confused. I'm still getting the same error as an hour ago.. it's working on your end?

Use the latest commit.

i am using latest commit but still having the refresh page issue on my end

dilemmax commented 3 years ago

Managed to sign in but when I tried to scrape, I'm getting this

Type: Profile 0.00B [00:00, ?B/s] Type: Stories No Stories Found. Type: Posts Scrape Attempt: 1/100 Missing 100 Posts... Retrying... Scrape Attempt: 2/100 Missing 50 Posts... Retrying... Scrape Attempt: 3/100 Missing 50 Posts... Retrying... Scrape Attempt: 4/100

dilemmax commented 3 years ago

Can verify it's all working again, except max_threads being anything other than "1"

Did this and I was getting the above so I changed it to 1 and it's working again.

salamihawk commented 3 years ago

I had this issue, updated to the latest version and it works again, however:

One onlyfans model had a post mixed with images and videos. The images downloaded fine, the videos didn't.

I tried re-running the script and now it appears to be hanging on that model's videos with "0it [00:00, ?it/s]"

I've seen something about max_threads being some kind of cure-all, but where does it get set? I have it set in the onlyfans section of the config.json dict, is that right?

dearcoding commented 3 years ago

I'm at a good point in using selenium to get the sign value.

It's actually a good and working way as far as i tested it.

The bot will become slower because for every request i need 3/5 seconds to generate the sign, but in this way they can update their algorithm as much as they want, i will always have the right sign.

I won't run it completely in selenium because selenium is shit, memory heavy and hard to manage...

I use selenium only for the sign calculation.

dilemmax commented 3 years ago

I had this issue, updated to the latest version and it works again, however:

One onlyfans model had a post mixed with images and videos. The images downloaded fine, the videos didn't.

I tried re-running the script and now it appears to be hanging on that model's videos with "0it [00:00, ?it/s]"

I've seen something about max_threads being some kind of cure-all, but where does it get set? I have it set in the onlyfans section of the config.json dict, is that right?

@salamihawk You change it at Line 8 in config.json.

hippothon commented 3 years ago

Awesome. But where did you get salt, and sha1 bytes order? Is it possible to parser from JS path/text (with regular exp) with static requests?

It's just all from removing their obfuscation. For example this highlighted bit: image

Translates to e.charCodeAt(2434 % e.length) - 101 Or since we know it's SHA1 that would be e.charCodeAt(2434 % 40) - 101 Which in the snippet I posted above is then e.charCodeAt(34) - 101

It's likely possible to do it statically, I have it 99% automated using regex but it runs in the browser so I can use their text replacement functions without having to copy them and mess with them myself. The risk with regex is they change something substantially and you have to start over again.

Realistically right now it takes me about 5 min to update my code after noticing it's broken. If it keep changing every day I might be more motivated to fully automate it but that's up to them ¯\(ツ)

mediaburnwayne commented 3 years ago

So are we just waiting on a new build?

benhacka commented 3 years ago

It's just all from removing their obfuscation. For example this highlighted bit: image

Translates to e.charCodeAt(2434 % e.length) - 101 Or since we know it's SHA1 that would be e.charCodeAt(2434 % 40) - 101 Which in the snippet I posted above is then e.charCodeAt(34) - 101

Pretty good. Thanks!

salamihawk commented 3 years ago

I had this issue, updated to the latest version and it works again, however: One onlyfans model had a post mixed with images and videos. The images downloaded fine, the videos didn't. I tried re-running the script and now it appears to be hanging on that model's videos with "0it [00:00, ?it/s]" I've seen something about max_threads being some kind of cure-all, but where does it get set? I have it set in the onlyfans section of the config.json dict, is that right?

@salamihawk You change it at Line 8 in config.json.

Gotcha, thanks... I was still working with an old file from an old version before the auth config got split off to .profiles

Still seems to hang at the same spot though

hippothon commented 3 years ago

So I think it's obvious that they're watching this page since they just pushed an update that tries to interfere with you using devtools. I'll continue to share changes but I'd advise against anyone sharing specific methods.

I don't have access to my code right now but will post the update for version 6 later.

dearcoding commented 3 years ago

So I think it's obvious that they're watching this page since they just pushed an update that tries to interfere with you using devtools. I'll continue to share changes but I'd advise against anyone sharing specific methods.

I don't have access to my code right now but will post the update for version 6 later.

I have a private method to compute sign using selenium, I want help community but i don't want publish it here since it would get patched quickly.

If anyone is interested please contact me on email (you get it on my profile).

ofext commented 3 years ago

So I think it's obvious that they're watching this page since they just pushed an update that tries to interfere with you using devtools. I'll continue to share changes but I'd advise against anyone sharing specific methods.

I don't have access to my code right now but will post the update for version 6 later.

What??? Those bastards!