Closed markdid closed 10 months ago
I am getting the same error as well
Haven't looked this way in months alas, Been too buys. But decided it's Degoo night today, so taking a look at things. Just tried this myself and got same response. Bummer. Will take a closer look.
Well, I have banged my head up against this for a while now and it is alas too hard to solve today. We will need some deeper insights into Cloudflare and/or a really good low level request debugger. I have experimented with fiddler to little avail in past alas. But to be clear and document the issue more technically and in way that might permit any erudite reader to comment or pursue a diagnosis, here is what the problem is.
Historically this reverse engineering, and indeed most like it of undocument web APIs, is based upon watching the Network tab in a browser and reproducing the requests seen therein. There are two main browser families of interest, Firefox and Chrome.
Each of these browsers, on the developer tools has a network tab on which all requests and response can be read. Each request can be copied in a number of formats, but a useful one for quick testing is the curl format. When logging in with Firefox (successfully) the request the browser sends is:
curl -i 'https://rest-api.degoo.com/login' -X POST -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:98.0) Gecko/20100101 Firefox/98.0' -H 'Accept: */*' -H 'Accept-Language: en-US,en;q=0.5' -H 'Accept-Encoding: gzip, deflate, br' -H 'Content-Type: application/json' -H 'Referer: https://app.degoo.com/' -H 'Origin: https://app.degoo.com' -H 'Sec-Fetch-Dest: empty' -H 'Sec-Fetch-Mode: cors' -H 'Sec-Fetch-Site: same-site' -H 'DNT: 1' -H 'Sec-GPC: 1' -H 'Connection: keep-alive' --data-raw '{"Username":"myemail_redacted","Password":"mypassword_redacted","GenerateToken":true}'
and similarly Chrome on a successful login reports this request was sent:
curl -i 'https://rest-api.degoo.com/login' \
-H 'sec-ch-ua: " Not A;Brand";v="99", "Chromium";v="100", "Google Chrome";v="100"' \
-H 'Referer: https://app.degoo.com/' \
-H 'sec-ch-ua-mobile: ?0' \
-H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.60 Safari/537.36' \
-H 'sec-ch-ua-platform: "Linux"' \
-H 'Content-Type: application/json' \
--data-raw '{"Username":"myemail_redacted","Password":"mypassword_redacted","GenerateToken":true}' \
--compressed
It is by replicating these requests that this API was reverse engineered in the first place. Now there is a longstanding problem we know and has been oft reported by users, and this that either of these requests issued at the command line produces:
HTTP/2 429
date: Wed, 30 Mar 2022 11:11:28 GMT
content-length: 0
expect-ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
vary: Accept-Encoding
server: cloudflare
cf-ray: 6f40679b3a785ac8-MEL
429 is a Too many requests error: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429
The login is rejected, and from this we can divine that Degoo is hiding behind a cloudflare firewall and it is cloudflare that is rejecting us.
The puzzle to solve is why? It lets Firefox log in and Chrome, but not command line curl.
As cloudflare can only see what is transmitted on the wire (the request sent) we must conclude that these requests captured by Firefox and Chrome are not true (and this warrants a bug report to each of these browsers, to pursue at leisure). Because if Firefox sends the request cloudflare process it, if curl sends it they reject it with a 429. If curl could reproduce exactly what Firefox sends then cloudflare cannot tell the difference and must log us in. Because it can tell the difference something in the request report delivered by these browsers is broken.
To diagnose what is difficult. One way is to watch the transaction on Wireshark. But no, that does not work with https, it is after all, secure. There are apparently methods: https://unit42.paloaltonetworks.com/wireshark-tutorial-decrypting-https-traffic/ but they will take time to study and research and I am no hopeful that they will yield fruit.
Then there is fiddler. It also has problems of course with https, but is more hopeful and I started working with it some while ago but it is time consume and I have faltered.
And then it struck me, it would be powerfully useful if we had a request mirror. A service that we could send the request to, so just replacing the URL of the POST request with the mirror and it reveals the exact request byte for byte that was received. We could point redirect the Firefox login request to it, to diagnose and then the curl one and see where the differ. They must differ.
I requested information about such services here:
but someone (for whom I have little respect as a consequence) downvoted it and the question was hidden. If it's possible to go upvote it, please do, and we can try to tap into the Stack Overflow community for wisdom in this space.
I have had more luck here:
https://github.com/xnbox/DeepfakeHTTP/issues/2
and DeepFakeHTTP is sometime to try, but again will take time to install and use to perform these tests. This is perhaps the hottest lead so far though.
Another avenue for possible support is cloudflare themselves. Searching on line with "cloudflare 429 errors" reveals a lot to read and lots of people facing these messages in many contexts. Sifting information from this haystack is again a time consuming job.
Getting help from degoo themselves is also an option but they do not generally respond to support requests and have not been supportive of these efforts or forthcoming with any API specifications or even hints.
At which point I invite anyone eager to help this project along to select one of these research paths and report back. I too will do that but as you will have observed I have basically the standard FOSS problem: a day job, children to feed, dozens of projects and commitments and a lot more going on to boot, so it can be rather difficult to get my attention on any one project with months between.
This problem can easily be worked around: https://github.com/MDKPredator/degoo_drive#login-bypass
By manually inserting the token I managed to login, but I only succeeded with "degoo_user" and with other commands I get back "400", which is probably due to my assembled "keys.json" file and the other missing.
After some testing and copying of "default_properties.txt" into the config folder I was able to use this tool and avoid the 403 error.
I would like to have a function that creates these files automatically so that you only have to copy the token.
When uploading, no download link was generated, but I was able to fix this with: https://github.com/bernd-wechner/Degoo/pull/41/commits/3930b9331be7195bc4d86017e9f69828b6591dea
Have you tried Firefox's "copy as cURL" right-click option in the Network tab?
As in: Right click on a request, hover over "Copy", then click "Copy as cURL".
Precisely what I do, and my recommendation.
Ah, sorry. Didn't see that anywhere.
To see exactly what a browser is sending, why would you not just create a simple socket server and print the request? Super simple. Something like: https://gist.github.com/pedrominicz/b699dec01b4afd38b8aba83f3089e175
Sure thing, thanks for the tip and link. That said, confidence regarding what the browser sends and receives is not the issue at hand as browsers provide kick ass excellent and trustworthy dev tools built in for that (though confirming it with an echo server is still a good double check), but knowing exactly what the Python libs send is the trick (and there, a good reliable echo server would help indeed).
As the task, in reverse engineering anything on the web is to:
A strong premise stands that all the server sees is what is sent in the request. And so if the request can be copied exactly then the server must respond in exactly the same way. Or, conversely, if the server is not responding in the same way we must assume we are not sending an identical request! But given that is the intent and it is being defined to, then the assumption becomes that the Python lbs between assembling the request and it's being sent, alter it in some way,
And this is what remains to be diagnosed. But hey, if it's so simple and you have time, chip in and set up an echo server, and document what the browser sends for a login request and what thjis script sends and the differences. It will help us all along.
No sorry. I am building my own FUSE filesystem for Degoo in GoLang instead. I will publish it for free when it's done.
Wish you'd show us how to do that in Python.
Well, one hot tip is to look at TLS. As an example, you are sending a user-agent as "Chrome version X". Cloudflare could easily have a list of TLS ciphers supported by that version. If the TLS ciphers supported in the SSL handshake differs from that list, you are obviously a "fake" Chrome client :)
"During the initial handshake sequence between the client and the server, the client presents its cipher list to the server, and the server selects one of the ciphers. Or, if the server does not support any of those ciphers, the server rejects the client request."
https://cabulous.medium.com/tls-1-2-andtls-1-3-handshake-walkthrough-4cfd0a798164
Here is even a proof of concept in Python for you: https://github.com/salesforce/ja3
"The JA3 algorithm takes a collection of settings from the SSL "Client Hello" such as SSL/TLS version, accepted cipher suites, list of extensions, accepted elliptic curves, and elliptic curve formats."
Your current Python HTTP requests are easily identified and Degoo has obviously put Rate Limit of 0 requests for requests made from a Python SSL fingerprint for the "/login" URI.
Will read that now closely as time permits, but thanks for the tip. Have you solved this login problem perchance in Go?
Had a quick look. It seems indeed that one state of the art client differentiation rests on TLS handshake analysis. Ouch.
The JA3 algorithm for example examining these fields in client HELLO message:
SSLVersion,Cipher,SSLExtension,EllipticCurve,EllipticCurvePointFormat
Egads. So the Python request would need to ideally then, to mimic Firefox's not just at the HTTP level but at the TLS handshake level too (duplicating the Client HELLO). Leaves me to wonder if that's easy to do with the Python libs. Something to drill into.
I had a look in recent day's more closely using the HTTP Header Live addon:
https://github.com/Nitrama/HTTP-Header-Live/
and an oddity I note too is that the Firefox request orders the headers in a way that the Python does not::
Host: rest-api.degoo.com
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0
Accept: */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Referer: https://app.degoo.com/
Content-Type: application/json
Origin: https://app.degoo.com
Content-Length: 82
Connection: keep-alive
Sec-Fetch-Dest: empty
Sec-Fetch-Mode: cors
Sec-Fetch-Site: same-site
Notably, the Content-Length: header is always at the end just preceding the data in the Python requests, raising the spectre that header ordering may also be analysed to differentiate the clients.
Had a quick look. It seems indeed that one state of the art client differentiation rests on TLS handshake analysis. Ouch.
The JA3 algorithm for example examining these fields in client HELLO message:
SSLVersion,Cipher,SSLExtension,EllipticCurve,EllipticCurvePointFormat
Egads. So the Python request would need to ideally then, to mimic Firefox's not just at the HTTP level but at the TLS handshake level too (duplicating the Client HELLO). Leaves me to wonder if that's easy to do with the Python libs. Something to drill into.
Yes. Not very hard to do though, as there are several Python libraries to "mimic" a known fingerprint. All you have to do is to use a valid user-agent + fingerprint that matches. The lib will make sure to construct the TLS handshake so that the result becomes the wanted fingerprint.
Cool, what are the Python libs packages)? Can you name or link to some? Would save us a bucket of time (over searching for candidates unless lucky with the first few keyword choices).
I'd love to try one.
Cool, what are the Python libs packages)? Can you name or link to some? Would save us a bucket of time (over searching for candidates unless lucky with the first few keyword choices).
I'd love to try one.
Dude, 5 seconds Googling :) I don't use Python, so I wouldn't know. Not sure if this one is complete. I use Go myself. https://geeksrepos.com/an0ndev/requests-ja3
The most annoying thing is that Degoo are only doing this check on /login (which is a REST API) and not on the regular GraphQL API. Also, this trips when using some proxies, so very strange move by Degoo.
Nothing is 5 seconds googling my friend... let alone finding esoteric python packages for mimicking tls handshakes. I was only asking because you said there were several and so may know and be able to name them. I'll take a look when I'm off the phone and back on a PC. And see how much Googling it takes 😉
My bad. I don't do Python. I assumed, as there are many for Go :)
No worries. I'm grateful for your insights and tips!
I have had a spare moment on a laptop just now and found these which are interesting:
https://github.com/lwthiker/curl-impersonate
Which means I can try to reproduce the login with curl to see if the theory of handshake fingerprinting to differentiate clients is the actual cause for the rejection of our login efforts. That curl impersonates browser TLS handshakes and so when I get a chance to try it if I can login via curl then we have proof of this theory!
https://github.com/tlsfuzzer/tlslite-ng
Which promises to provide nuanced TLS handshake control in Python. Just based on a cursory reading so far.
More updates as I check these threads, and thanks heaps for the lead.
You don't need to confirm it. I already did. It's easy with docker. You get a 302 instead of a 429 response.
docker pull lwthiker/curl-impersonate:0.5-chrome
docker run --rm lwthiker/curl-impersonate:0.5-chrome curl_chrome101 'https://degoo.com/me/login' \
-H 'sec-ch-ua: " Not A;Brand";v="99", "Chromium";v="100", "Google Chrome";v="100"' \
-H 'Referer: https://app.degoo.com/' \
-H 'sec-ch-ua-mobile: ?0' \
-H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.60 Safari/537.36' \
-H 'sec-ch-ua-platform: "Linux"' \
-H 'Content-Type: application/json' \
--data-raw '{"Username":"myemail_redacted","Password":"mypassword_redacted","GenerateToken":true}' \
--compressed
Cool. Thanks for that. Why docker out of interest? What value does it add there? I'd have just run curl-impersonate myself.
The 302 is the redirect to the page after logging in I imagine. Let's see if I can find a python package that does what curl-impersonate does. tlslite-ng sort of suggests it's doable building on the sockets package.
Because the docker one does not require any installation of extra libs etc. I don't like to bloat my system with random hacks :) If you use libcurl with Python, you can just switch it out to the patched one and it should "just work". But forcing people to run a bundled libcurl is probably not a good idea.
And yes, the 302 is just the redirection to /app, so it means success.
There may be a way to do it without libcurl:
https://www.scrapingbee.com/blog/web-scraping-without-getting-blocked/#tls-fingerprinting
which directs to:
Which will take some reading, thinking and tinkering. All been more than 5 seconds google by a long way, but I'm still looking. Turns out we are far from alone, there is a busy world of people trying to mimic browser TLS fingerpirnts for programmatic access to websites ... it's a classic arms race situation.
Looks to me like it's cloudflare not degoo who add that layer of security too BTW. Not convinced Degoo, as a business care. But cloudflare are a much bigger player with many many customers who have a strong interest in security ...
Of course it's Cloudflare that blocks, but it's Degoo that has enabled the "Browser Integrity Check" setting in Cloudflare. This comes with an extra fee as it requires more compute power for each request, so Degoo definitely knows what they are doing and fully responsible for this. Remember, if they enable this for the GraphQL as well, your entire application will stop working, not just the login.
It would have taken you 5 seconds for Go :) It's just a lib and 1 line of extra code. Python just sucks, like always :P
Python doesn't suck, it's lovely. But thanks for your tips and I hope the reverse engineering I did here help you with your project. I look forward to seeing the results and get a fuse for degoo. I'll see if I can get the TLS handshake spoofed some this is nice too:
https://scrapfly.io/blog/how-to-avoid-web-scraping-blocking-tls/
And suggests two Python methods for fixing it. Without, alas, providing a clear example of how simply to spoof say Firefox or Chrome like curl-imperonsate does.
Just an update that basic file listing works in my FUSE FS works now, so I can confirm that FA3 fingerprinting solves the login problem. I think you will have a hard time making it work in Python though, as it does not have the same access to the network stack as GoLang has. Hopefully it's just enough to change the ciphers to fool Cloudflare.
HI all, which big thread here....Unfortunately I see that the login still doesn't work. Any update on that front?
Alas no, as noted we now know why, it's in the TLS handshake and we've not had time to find a Python solution for mimicking a browser's TLS handshake yet. One will emerge in time I am sure as this is a relatively young security protocol that is getting in lots of people's way alas.
As noted earlier there is an outline of how to fix it here:
https://scrapfly.io/blog/how-to-avoid-web-scraping-blocking-tls/#python
But alas not clear guidance onto how to replicate the JA3 fingerprint of common browsers. So it needs more time thatn I've been able to give it to evolve that. But is emminently diable it seems.
OK, had a moment this morning as nasty rain prevented me doing other things I had planned, so took a look. It's not looking good. The problem is stated simply as follows:
Hi @bernd-wechner thanks for the follow up and clarification. It is pity that we cannot use the cli for the time being. This is shrinking the degoo usability in my perspective.
Hi @bernd-wechner thanks for the follow up and clarification. It is pity that we cannot use the cli for the time being. This is shrinking the degoo usability in my perspective.
Don't worry. My degoo fuse filesystem is pretty much done. I just want to do some more testing before releasing it. Maybe even this weekend.
You're a champ, buzzy. Thanks so much! Is it up on a github repo?
As i previusly posted there is a woraround with the broser to get the login tokins: https://github.com/bernd-wechner/Degoo/issues/42#issuecomment-1100919703
Hi @bernd-wechner thanks for the follow up and clarification. It is pity that we cannot use the cli for the time being. This is shrinking the degoo usability in my perspective.
But tokens keeps expiring after a while, so it's just a temporary "hack".
Could the login problem be fixed by mimicking a full browser with selenium like in this project ? https://github.com/tzchz/degoo-active
This should now be fixed by:
https://github.com/bernd-wechner/Degoo/commit/bc92782790771faacd506e4d110622633bc120b6
though the whole package remains a WIP (work in progress) and under testing. Not least as the degoo API changed in the interim and I've also made some updates here, but am not done with testing and it's one of far too many pies I have my fingers in and tabs I have open ;-).
I tried installing everything but I get this error