AlexCSDev / PatreonDownloader

Powerful tool for downloading content posted by creators on patreon.com. Supports content hosted on patreon itself as well as external sites (additional plugins might be required).
MIT License
932 stars 96 forks source link

Cookie, mux.com and datadome issues #125

Closed ReysukeBaka closed 2 years ago

ReysukeBaka commented 2 years ago

Hey, getting an Error recently any idea how to fix it?

2022-05-14 13:52:50.0560 DEBUG [PatreonDownloader.Implementation.PatreonPageCrawler] Page #4: https://www.patreon.com/api/posts?include=user%2Cattachments%2Ccampaign%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_responses.choice%2Cpoll.current_user_responses.poll%2Caccess_rules.tier.null%2Cimages.null%2Caudio.null&fields%5Bpost%5D=change_visibility_at%2Ccomment_count%2Ccontent%2Ccurrent_user_can_delete%2Ccurrent_user_can_view%2Ccurrent_user_has_liked%2Cembed%2Cimage%2Cis_paid%2Clike_count%2Cmin_cents_pledged_to_view%2Cpost_file%2Cpost_metadata%2Cpublished_at%2Cpatron_count%2Cpatreon_url%2Cpost_type%2Cpledge_url%2Cthumbnail_url%2Cteaser_text%2Ctitle%2Cupgrade_url%2Curl%2Cwas_posted_by_campaign_owner&fields%5Buser%5D=image_url%2Cfull_name%2Curl&fields%5Bcampaign%5D=show_audio_post_download_links%2Cavatar_photo_url%2Cearnings_visibility%2Cis_nsfw%2Cis_monthly%2Cname%2Curl&fields%5Baccess_rule%5D=access_rule_type%2Camount_cents&fields%5Bmedia%5D=id%2Cimage_urls%2Cdownload_url%2Cmetadata%2Cfile_name&sort=-published_at&filter%5Bis_draft%5D=false&filter%5Bcontains_exclusive_posts%5D=true&json-api-use-default-includes=false&json-api-version=1.0&filter%5Bcampaign_id%5D=3133042&page%5Bcursor%5D=01SUSjbQm6uGXMGHMnHbaLxrQ_ 2022-05-14 13:52:50.3300 FATAL [PatreonDownloader.App.Program] Fatal error, application will be closed: UniversalDownloaderPlatform.Common.Exceptions.DownloadException: Error status code returned: BadRequest at UniversalDownloaderPlatform.DefaultImplementations.WebDownloader.DownloadStringInternal(String url, Int32 retry, Int32 retryTooManyRequests) in F:\Sources\BigProjects\PatreonDownloader\submodules\UniversalDownloaderPlatform\UniversalDownloaderPlatform.DefaultImplementations\WebDownloader.cs:line 323 at UniversalDownloaderPlatform.DefaultImplementations.WebDownloader.DownloadString(String url) in F:\Sources\BigProjects\PatreonDownloader\submodules\UniversalDownloaderPlatform\UniversalDownloaderPlatform.DefaultImplementations\WebDownloader.cs:line 288 at PatreonDownloader.Implementation.PatreonWebDownloader.DownloadString(String url) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.Implementation\PatreonWebDownloader.cs:line 55 at PatreonDownloader.Implementation.PatreonWebDownloader.DownloadString(String url) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.Implementation\PatreonWebDownloader.cs:line 73 at PatreonDownloader.Implementation.PatreonPageCrawler.Crawl(ICrawlTargetInfo crawlTargetInfo, String downloadDirectory) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.Implementation\PatreonPageCrawler.cs:line 84 at UniversalDownloaderPlatform.Engine.UniversalDownloader.Download(String url, String downloadDirectory, IUniversalDownloaderPlatformSettings settings) in F:\Sources\BigProjects\PatreonDownloader\submodules\UniversalDownloaderPlatform\UniversalDownloaderPlatform.Engine\UniversalDownloader.cs:line 198 at PatreonDownloader.App.Program.RunPatreonDownloader(CommandLineOptions commandLineOptions) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.App\Program.cs:line 143 at PatreonDownloader.App.Program.Main(String[] args) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.App\Program.cs:line 69

clocklear commented 2 years ago

Ok, here's where I am:

(I am now having an issue where I need figure out how to make ffmpeg send the Referer header when I attempt to convert my m3u8 files to mp3, but that is beyond the scope of this project.)

Thanks for your willingness to work back and forth with me @AlexCSDev !

As another aside, for my purposes, I added a check to ignore any files that are already present in the download directory (near the end of PatreonCrawledUrlProcessor.cs:

// Skip if already exists?
if (File.Exists(crawledUrl.DownloadPath))
{
  _logger.Debug($"Skipping download because '{filename}' already exists in target directory");
  return false;
}

I figured this would reduce duplicative downloads and minimize the 'visibility' of my scraping from CloudFront's/Patreon's point of view.

AlexCSDev commented 2 years ago

No problem, I should be the one to thank you for helping me with debugging of this issue. But we're far from being finished with this yet. :)

I will take a note about referer. As for the file existance check - it's not that simple. This check is already implemented in WebDownloader.cs, but it's a little bit more advanced - it also tries to compare file sizes of the remote and local files. If the host does not return file size (most notable example is patreonusercontent.com) it will do whatever is specified in --no-remote-size-action command line option. As for the visibility thing - I don't think this is something you should really care about right now, it doesn't seem their detection rules are that paranoid for now.

(I am now having an issue where I need figure out how to make ffmpeg send the Referer header when I attempt to convert my m3u8 files to mp3, but that is beyond the scope of this project.)

Been a while, but last time I had to deal with m3u8 I used youtube-dl: youtube-dl.exe --format 0 https://xxxxxxx/xxxx.m3u8 --referer https://referer-target I think there is a command line option to pass additional commands to ffmpeg so you can convert it to mp3 right away.

AlexCSDev commented 2 years ago

I think I've got HttpClient with custom cookie management running, but I think I have triggered datadome protection and now I'm at endless captcha check. Will take some time before I can test it.

AlexCSDev commented 2 years ago

Please test if this test build fixes the issue: https://mega.nz/file/jtF3ETKL#FLSo-oTBnjQmGVZGtFkLhvTV1oRYB_gQrgg1stLT_HE

The source code for that build for the debugging purposes is available here: https://github.com/AlexCSDev/PatreonDownloader/tree/125_fix (.NET 6 SDK/VS 2022 required)

TheQwerty commented 2 years ago

With that new build I'm still seeing the BadRequest exception when it attempts to get the second page of posts. (Haven't seen a single captcha check the few times I've ran it.)

The request for page 1 includes 23 cookies. The request for page 2 only 22 - the session_id goes missing.

No relevant Trace lines containing cookie or session id between the two requests.

EDIT: The only other curious thing it seems between the two requests is the second one is taken verbatim from the .links.next json value and therefore the [ and ] characters are still percent encoded as %5B and %5D.

SubbyDew commented 2 years ago

I have tested the new test build with 3 creators, one works fine now, another only downloaded a few items and the 3rd grabbed nothing.

The 2 that are not grabbing everything keep showing WARN Current user cannot view this post for every post. This doesn't make sense as I can even open the local chromium browser and view all the content myself.

The super weird thing is that I just went back and tried the 0.10.3.0 build again and it worked perfectly. ¯_(ツ)_/¯

Spyridion commented 2 years ago

I tried the new test build and got the same retryTooManyRequests error after a couple of pages. I typed the failing page URL in my regular browser and got the json payload just fine though. This is the error I got:

2022-06-15 01:06:40.0243 FATAL [PatreonDownloader.App.Program] Fatal error, application will be closed: UniversalDownloaderPlatform.Common.Exceptions.DownloadException: Error status code returned: BadRequest at UniversalDownloaderPlatform.DefaultImplementations.WebDownloader.DownloadStringInternal(String url, Int32 retry, Int32 retryTooManyRequests) in F:\Sources\BigProjects\PatreonDownloader\submodules\UniversalDownloaderPlatform\UniversalDownloaderPlatform.DefaultImplementations\WebDownloader.cs:line 318 at UniversalDownloaderPlatform.DefaultImplementations.WebDownloader.DownloadString(String url) in F:\Sources\BigProjects\PatreonDownloader\submodules\UniversalDownloaderPlatform\UniversalDownloaderPlatform.DefaultImplementations\WebDownloader.cs:line 277 at PatreonDownloader.Implementation.PatreonWebDownloader.DownloadString(String url) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.Implementation\PatreonWebDownloader.cs:line 63 at PatreonDownloader.Implementation.PatreonWebDownloader.DownloadString(String url) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.Implementation\PatreonWebDownloader.cs:line 81 at PatreonDownloader.Implementation.PatreonPageCrawler.Crawl(ICrawlTargetInfo crawlTargetInfo, String downloadDirectory) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.Implementation\PatreonPageCrawler.cs:line 84 at UniversalDownloaderPlatform.Engine.UniversalDownloader.Download(String url, String downloadDirectory, IUniversalDownloaderPlatformSettings settings) in F:\Sources\BigProjects\PatreonDownloader\submodules\UniversalDownloaderPlatform\UniversalDownloaderPlatform.Engine\UniversalDownloader.cs:line 198 at PatreonDownloader.App.Program.RunPatreonDownloader(CommandLineOptions commandLineOptions) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.App\Program.cs:line 143 at PatreonDownloader.App.Program.Main(String[] args) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.App\Program.cs:line 69

urbanyeti commented 2 years ago

Ah ha! I was able to figure this out and luckily there's a quick workaround for your https://github.com/AlexCSDev/PatreonDownloader/tree/125_fix build.

The default CookieContainer has a PerDomainCapacity of 20 cookies and it seems like it quickly hits this after the first page or two. Not sure exactly why this would be the case, but my guess is there's something going on where you are setting the cookies on line 73 of HttpCookieClient. It keeps getting back a datadome cookie but setting it with a different request url each time.

In any case, changing line 34 of HttpCookieClient to specify a larger PerDomainCapacity of 300 got me back up and running: _cookieContainer = new CookieContainer(CookieContainer.DefaultCookieLimit,300,CookieContainer.DefaultCookieLengthLimit);

vincinuge commented 2 years ago

Nice work!

AlexCSDev commented 2 years ago

Yep, if you count all tracking garbage there are 24 cookies set by patreon. Judging by the logic in microsoft's code it doesn't care if you update an existing cookie, it will keep cleaning up the collection until it reaches target size.

Great work, I will prepare a new test build tomorrow or the day after.

AlexCSDev commented 2 years ago

New test build, please try it: https://mega.nz/file/30El3RrI#EF37ESPPwIrZXYlhP_AFKBR0Yqjpl-bS66N90ICE1jI

Important: clean install is highly recommended due to changes in how application is packaged

WindedHero commented 2 years ago

I found this thread due to an infinite verification loop (#128) and figured I'd try out this new test build myself on a clean install but it gives me this error: Option 'verbose' is unknown.

AlexCSDev commented 2 years ago

I found this thread due to an infinite verification loop (#128) and figured I'd try out this new test build myself on a clean install but it gives me this error: Option 'verbose' is unknown.

--verbose is deprecated starting with this version. Use --log-level Trace and optionally --log-save to save the log to file.

clocklear commented 2 years ago

@AlexCSDev can you please add the Referer: https://www.patreon.com header to any external download calls? As stated previously, the embeds on my creators stream are hosted on mux.com, which blocks downloads if Referer isn't set to a known entity.

AlexCSDev commented 2 years ago

My bad, I have completely forgotten about that. This build should include that functionality: https://mega.nz/file/K49EGQyZ#wOj8i4rxKq-OLZfrSKQWphdvhdcruOlbKWZp8nIeD8A

clocklear commented 2 years ago

Confirmed, new test build is working for me with no external modifications required 😎 .

Thanks @AlexCSDev ! And thanks to @urbanyeti for the great find on per-domain cookie setting!

vincinuge commented 2 years ago

All of you folks are awesome!

vincinuge commented 2 years ago

Latest Test Build is giving me infinite verification bug again. Worked ok yesterday.

burnshroom commented 2 years ago

Same

aksskl commented 2 years ago

Latest Test Build is giving me infinite verification bug again. Worked ok yesterday.

Same here

AlexCSDev commented 2 years ago

If you are having issues with net6.0-win-x64-release-cookiefix2706_2.zip, try using net6.0-win-x64-release-cookiefix2706.zip (the without referer fix) and let me know if it works. I wonder if sending referer in all requests is not a good idea.

aksskl commented 2 years ago

Both 2706 and 2706_2 are the same for me; infinite slider verification. What's odd is that it doesn't even ask for my credentials first. Straight to verification. Running a clean install so I assumed credentials would be first, but no.

clocklear commented 2 years ago

I don't think the infinite verification 'bug' is a bug in this software, so much as it is a 'feature' in the CloudFlare protection used by Patreon. When I encountered this, waiting 24-48 hours (anecdotally) solved the problem.

burnshroom commented 2 years ago

tried it again today and it worked

aksskl commented 2 years ago

0.10.3.0 is working for me today. Test builds as well.

AlexCSDev commented 2 years ago

I think it's safe to assume that this is something that cannot be fixed or bypassed in the PatreonDownloader, so the test build will be released as a new version a bit later.

Waiting a day or two, using some less known vpn (or self hosted vpn on some small hosting provider) should fix infinite verification. The new version will also support proxy servers, with this you should be able to use residential proxy servers to bypass this issue for sure.

If anyone ends up figuring out what triggers this protection, feel free to share the information with us.

AlexCSDev commented 2 years ago

Actually you know what. I might have fixed that infinite verification thing.

Try this test build: https://mega.nz/file/TptzBS5Z#NLBh9qrJLmSRlRF__oGYcANWIynsXbSxUB83nc5GcQ4

El-tra commented 2 years ago

Actually you know what. I might have fixed that infinite verification thing.

Try this test build: https://mega.nz/file/TptzBS5Z#NLBh9qrJLmSRlRF__oGYcANWIynsXbSxUB83nc5GcQ4

Works for me!

aksskl commented 2 years ago

Actually you know what. I might have fixed that infinite verification thing.

Try this test build: https://mega.nz/file/TptzBS5Z#NLBh9qrJLmSRlRF__oGYcANWIynsXbSxUB83nc5GcQ4

Other builds started acting up. The new build works.

jimbohne commented 2 years ago

Actually you know what. I might have fixed that infinite verification thing.

Try this test build: https://mega.nz/file/TptzBS5Z#NLBh9qrJLmSRlRF__oGYcANWIynsXbSxUB83nc5GcQ4

I've had the infinite verification problem with the current release. Came here to report the bug, tried your test build - works like a charm! Thanks a lot!