Closed ReysukeBaka closed 2 years ago
Yeah, I've been getting the same thing. Only on certain artists though. Not sure if the fact that the artist has polls on it affects it in anyway, as the only artists that have worked don't have polls on the page.
Yeah, works on some But even have an artist without polls where it just wont work
also a bunch of files failing to get an ID
I've also noticed this happening as well. I think I'm seeing it for all artists so far.
Please try installing Cloudflare WARP. Make sure "1.1.1.1 with WARP" mode is enabled.
Hmm, that still doesn't seem to solve the problem.
For the artists, its finding all the posts, receiving errors when encountering polls, and then it gives me a fatal error.
2022-05-22 11:53:17.8149 FATAL Fatal error, application will be closed: UniversalDownloaderPlatform.Common.Exceptions.DownloadException: Error status code returned: BadRequest
I've made a test build which will write error details into the "debug" folder. Please send the contents of this folder to alexcsdev@protonmail.com or post them here.
https://mega.nz/file/TgkHwKCR#ZXN4Qw30tjIHqaCa0PxxnQeSVdAGDb7jC3XPUqDQedY
Curious error. Run the app with the --verbose option and upload the latest log file from the logs folder to the https://pastebin.com/. It will contain the creator name, so if you don't want to post it publicly you can send the file to my email (alexcsdev@protonmail.com) instead.
Log file has been sent to your email.
Ok, the ERROR [PatreonDownloader.Implementation.PatreonPageCrawler] Verification for XXXXXX: Unknown type for "included": poll
thing can be ignored, this is not related to the issue you guys are having.
I honestly don't know what is going on here. Every single user who sent me their logs has the same issue with the page cursor being invalid or expired, but I haven't experienced this issue myself even on the same creator as vincinuge used.
Maybe this is some kind of internet provider issue? Can you guys share which ISP you are using?
Verizon
What ISP are you using?
My current place of living makes my ISP information useless for 99% of the people who are using this app.
Telekom in Germany
Same problem here with Telekom and different DNS services (even 1.1.1.1).
Aussie Broadband in Australia, also using 1.1.1.1
Hi, I tried using your test build but I hit a different sort of error. I noticed before this the program would cycle between killing the chrome processes and saying opening browser for the captcha but not open anything.
---> System.ComponentModel.Win32Exception (299): Only part of a ReadProcessMemory or WriteProcessMemory request was completed.
at System.Diagnostics.NtProcessManager.EnumProcessModulesUntilSuccess(SafeProcessHandle processHandle, IntPtr[] modules, Int32 size, Int32& needed)
at System.Diagnostics.NtProcessManager.GetModules(Int32 processId, Boolean firstModuleOnly)
at System.Diagnostics.NtProcessManager.GetFirstModule(Int32 processId)
at System.Diagnostics.Process.get_MainModule()
at PatreonDownloader.PuppeteerEngine.PuppeteerEngine.<>c.<KillChromeIfRunning>b__10_0(Process x) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.PuppeteerEngine\PuppeteerEngine.cs:line 75
at System.Linq.Enumerable.WhereArrayIterator`1.ToArray()
at PatreonDownloader.PuppeteerEngine.PuppeteerEngine.KillChromeIfRunning() in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.PuppeteerEngine\PuppeteerEngine.cs:line 74
at PatreonDownloader.PuppeteerEngine.PuppeteerEngine.Initialize(Uri remoteBrowserAddress, Boolean headless) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.PuppeteerEngine\PuppeteerEngine.cs:line 63
at PatreonDownloader.PuppeteerEngine.PuppeteerEngine..ctor(Boolean headless) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.PuppeteerEngine\PuppeteerEngine.cs:line 48
at PatreonDownloader.PuppeteerEngine.PuppeteerCaptchaSolver..ctor() in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.PuppeteerEngine\PuppeteerCaptchaSolver.cs:line 24
at PatreonDownloader.Implementation.PatreonWebDownloader.SolveCaptchaAndUpdateCookies(String url) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.Implementation\PatreonWebDownloader.cs:line 82
at PatreonDownloader.Implementation.PatreonWebDownloader.DownloadString(String url) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.Implementation\PatreonWebDownloader.cs:line 63
at PatreonDownloader.Implementation.PatreonWebDownloader.DownloadString(String url) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.Implementation\PatreonWebDownloader.cs:line 66
at PatreonDownloader.Implementation.PatreonWebDownloader.DownloadString(String url) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.Implementation\PatreonWebDownloader.cs:line 66
at PatreonDownloader.Implementation.PatreonWebDownloader.DownloadString(String url) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.Implementation\PatreonWebDownloader.cs:line 66
at PatreonDownloader.Implementation.PatreonWebDownloader.DownloadString(String url) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.Implementation\PatreonWebDownloader.cs:line 66
at PatreonDownloader.Implementation.PatreonWebDownloader.DownloadString(String url) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.Implementation\PatreonWebDownloader.cs:line 66
at PatreonDownloader.Implementation.PatreonWebDownloader.DownloadString(String url) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.Implementation\PatreonWebDownloader.cs:line 66
at PatreonDownloader.Implementation.PatreonCrawlTargetInfoRetriever.GetCampaignId(String url) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.Implementation\PatreonCrawlTargetInfoRetriever.cs:line 36
--- End of inner exception stack trace ---
at PatreonDownloader.Implementation.PatreonCrawlTargetInfoRetriever.GetCampaignId(String url) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.Implementation\PatreonCrawlTargetInfoRetriever.cs:line 49
at PatreonDownloader.Implementation.PatreonCrawlTargetInfoRetriever.RetrieveCrawlTargetInfo(String url) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.Implementation\PatreonCrawlTargetInfoRetriever.cs:line 24
at UniversalDownloaderPlatform.Engine.UniversalDownloader.Download(String url, String downloadDirectory, IUniversalDownloaderPlatformSettings settings) in F:\Sources\BigProjects\PatreonDownloader\submodules\UniversalDownloaderPlatform\UniversalDownloaderPlatform.Engine\UniversalDownloader.cs:line 176
at PatreonDownloader.App.Program.RunPatreonDownloader(CommandLineOptions commandLineOptions) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.App\Program.cs:line 143
at PatreonDownloader.App.Program.Main(String[] args) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.App\Program.cs:line 69```
Please try version 0.10.3.0. I have improved browser mimicking in that version.
@Spyridion "ReadProcessMemory or WriteProcessMemory" issue is being tracked here https://github.com/AlexCSDev/PatreonDownloader/issues/123
Just tried with 0.10.3.0 and got this FATAL error partway through the crawl:
2022-06-01 20:52:31.8821 FATAL Fatal error, application will be closed: UniversalDownloaderPlatform.Common.Exceptions.DownloadException: Error status code returned: BadRequest at UniversalDownloaderPlatform.DefaultImplementations.WebDownloader.DownloadStringInternal(String url, Int32 retry, Int32 retryTooManyRequests) in F:\Sources\BigProjects\PatreonDownloader\submodules\UniversalDownloaderPlatform\UniversalDownloaderPlatform.DefaultImplementations\WebDownloader.cs:line 333 at UniversalDownloaderPlatform.DefaultImplementations.WebDownloader.DownloadString(String url) in F:\Sources\BigProjects\PatreonDownloader\submodules\UniversalDownloaderPlatform\UniversalDownloaderPlatform.DefaultImplementations\WebDownloader.cs:line 292 at PatreonDownloader.Implementation.PatreonWebDownloader.DownloadString(String url) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.Implementation\PatreonWebDownloader.cs:line 55 at PatreonDownloader.Implementation.PatreonWebDownloader.DownloadString(String url) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.Implementation\PatreonWebDownloader.cs:line 73 at PatreonDownloader.Implementation.PatreonPageCrawler.Crawl(ICrawlTargetInfo crawlTargetInfo, String downloadDirectory) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.Implementation\PatreonPageCrawler.cs:line 84 at UniversalDownloaderPlatform.Engine.UniversalDownloader.Download(String url, String downloadDirectory, IUniversalDownloaderPlatformSettings settings) in F:\Sources\BigProjects\PatreonDownloader\submodules\UniversalDownloaderPlatform\UniversalDownloaderPlatform.Engine\UniversalDownloader.cs:line 198 at PatreonDownloader.App.Program.RunPatreonDownloader(CommandLineOptions commandLineOptions) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.App\Program.cs:line 143 at PatreonDownloader.App.Program.Main(String[] args) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.App\Program.cs:line 69
Hi @AlexCSDev, Sorry I didn't realize that was part of that issue. I have tried the new version released though and I have gotten the same error as SubbyDew above me. I am using WARP too.
Its working fine now for me. New Version pretty much fixed it for me. Getting a "Can view post" even tho i have access but thats already in another thread
Everyone who is still having this issue, please try removing chromedata directory and try again. Make sure you are running the latest version.
I'm running the latest version. I tried removing the chromedata
directory, still the same error:
2022-06-02 21:05:34.3767 FATAL Fatal error, application will be closed: UniversalDownloaderPlatform.Common.Exceptions.DownloadException: Error status code returned: BadRequest
at UniversalDownloaderPlatform.DefaultImplementations.WebDownloader.DownloadStringInternal(String url, Int32 retry, Int32 retryTooManyRequests) in F:\Sources\BigProjects\PatreonDownloader\submodules\UniversalDownloaderPlatform\UniversalDownloaderPlatform.DefaultImplementations\WebDownloader.cs:line 333
at UniversalDownloaderPlatform.DefaultImplementations.WebDownloader.DownloadString(String url) in F:\Sources\BigProjects\PatreonDownloader\submodules\UniversalDownloaderPlatform\UniversalDownloaderPlatform.DefaultImplementations\WebDownloader.cs:line 292
at PatreonDownloader.Implementation.PatreonWebDownloader.DownloadString(String url) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.Implementation\PatreonWebDownloader.cs:line 55
at PatreonDownloader.Implementation.PatreonWebDownloader.DownloadString(String url) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.Implementation\PatreonWebDownloader.cs:line 73
at PatreonDownloader.Implementation.PatreonPageCrawler.Crawl(ICrawlTargetInfo crawlTargetInfo, String downloadDirectory) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.Implementation\PatreonPageCrawler.cs:line 84
at UniversalDownloaderPlatform.Engine.UniversalDownloader.Download(String url, String downloadDirectory, IUniversalDownloaderPlatformSettings settings) in F:\Sources\BigProjects\PatreonDownloader\submodules\UniversalDownloaderPlatform\UniversalDownloaderPlatform.Engine\UniversalDownloader.cs:line 198
at PatreonDownloader.App.Program.RunPatreonDownloader(CommandLineOptions commandLineOptions) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.App\Program.cs:line 143
at PatreonDownloader.App.Program.Main(String[] args) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.App\Program.cs:line 69
Perhaps there is a single post type on the artist feed that the downloader doesn't like? Is it possible to add a command flag to skip problematic items instead of failing completely?
The issue is more complicated than that. The app relies on patreon itself to tell it how to access the next page with the posts. For some reason for some users the returned url is not valid. It's impossible to continue going through creator's posts after that happens.
The issue here is that I don't know why that happens and I can't reproduce it on my side.
I'm also still getting the same error after deleting chromedata on the latest version:
2022-06-03 14:13:30.3387 FATAL Fatal error, application will be closed: UniversalDownloaderPlatform.Common.Exceptions.DownloadException: Error status code returned: BadRequest
at UniversalDownloaderPlatform.DefaultImplementations.WebDownloader.DownloadStringInternal(String url, Int32 retry, Int32 retryTooManyRequests) in F:\Sources\BigProjects\PatreonDownloader\submodules\UniversalDownloaderPlatform\UniversalDownloaderPlatform.DefaultImplementations\WebDownloader.cs:line 333
at UniversalDownloaderPlatform.DefaultImplementations.WebDownloader.DownloadString(String url) in F:\Sources\BigProjects\PatreonDownloader\submodules\UniversalDownloaderPlatform\UniversalDownloaderPlatform.DefaultImplementations\WebDownloader.cs:line 292
at PatreonDownloader.Implementation.PatreonWebDownloader.DownloadString(String url) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.Implementation\PatreonWebDownloader.cs:line 55
at PatreonDownloader.Implementation.PatreonWebDownloader.DownloadString(String url) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.Implementation\PatreonWebDownloader.cs:line 73
at PatreonDownloader.Implementation.PatreonPageCrawler.Crawl(ICrawlTargetInfo crawlTargetInfo, String downloadDirectory) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.Implementation\PatreonPageCrawler.cs:line 84
at UniversalDownloaderPlatform.Engine.UniversalDownloader.Download(String url, String downloadDirectory, IUniversalDownloaderPlatformSettings settings) in F:\Sources\BigProjects\PatreonDownloader\submodules\UniversalDownloaderPlatform\UniversalDownloaderPlatform.Engine\UniversalDownloader.cs:line 198
at PatreonDownloader.App.Program.RunPatreonDownloader(CommandLineOptions commandLineOptions) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.App\Program.cs:line 143
at PatreonDownloader.App.Program.Main(String[] args) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.App\Program.cs:line 69
The issue is more complicated than that. The app relies on patreon itself to tell it how to access the next page with the posts. For some reason for some users the returned url is not valid. It's impossible to continue going through creator's posts after that happens.
The issue here is that I don't know why that happens and I can't reproduce it on my side.
Got it. Gonna attempt some local debugging with my artist feed (assuming I can get the project built/running). Will report back.
It seems that the URL returned by the Patreon API is valid; navigating to the URL in my normal browser returns a 200 with a proper JSON response, but the WebDownloader
DownloadStringInternal
function is getting a 400 from that URL. Wonder if there is a missing request header that the browser impersonation implementation is missing...
I don't think that's a header issue, but who knows... I will really appreciate it if you will try to figure this out, no matter what I do I can't replicate this behavior.
It's not a header issue, per se -- you are correct. However, in attempting to debug, I seem to have gotten myself on CloudFlare's naughty list -- I can't auth any more and keep getting redirected to do captcha checks over and over again.
What I will say is that I was doing debugging with Fiddler comparing the requests from PatreonDownloader to the requests from my browser and the only (meaningful) differences were in the cookie. I think it was the case that I was missing a session_id
in the PatreonDownloader request, but I had a working one on the browser side.
Aha. Yep, seems like this is an issue. My tests show that cursor id is probably tied to current user's session. But the missing cookie means the user is not logged in.
I wonder why that cookie goes missing... Does it not getting transferred from the browser at all? Or something overrides it? Really interesting issue...
I wonder if the site is sending a Set-Cookie
header on a previous request that is missing the session_id
, to which the shared _httpClient
happily obliges. Which causes the next request to fail because there is no session associated with the request.
I wonder if something about the activity leading up to that point is triggering some sort of CloudFlare protection which is triggering the destruction of the session.
Success! I was able to successfully scrape the contents of my target feed, though I cannot say for sure which of these items caused it to work:
DownloadStringInternal
(around L297)EDIT: at least I thought I was good. Seems like subsequent calls to download resources have resulted in lots of 403s; the m3u8 URLs result in Forbidden.
We can test the cookie thing quite easily I think.
In DownloadStringInternal
before using (var request = new HttpRequestMessage(HttpMethod.Get, url) {Version = _httpVersion})
add the following code and it will dump all of the cookies present before the request is made. This will allow us to see if first request receives session_id cookie at all and if request prior to current request removes the cookies:
_logger.Info($"New request: {url}");
CookieCollection cookies = _httpClientHandler.CookieContainer.GetCookies(new Uri("https://patreon.com"));
foreach (Cookie cookie in cookies)
{
_logger.Info($"Cookie: {cookie.Name}={cookie.Value}");
}
Well, now I'm puzzled. I removed my sleep and added the cookie logging bits above. The process completed with no issues, and I see session_id
present on every call. I still have to use my pre-auth'ed remote browser session because I can't get through the captcha in the headless browser. I'll keep trying, possibly later this weekend.
Alright, thank you!
Thanks for your hard work guys. If this bug gets fixed, its going to save me hours of effort in my archiving process.
Doing some more testing this morning. Sufficient time has passed such that I am no longer on the CloudFlare naughty list and the headless browser is allowing me to authenticate properly. However, I've just witnessed the 'session ID is missing' problem. Fortunately, I've got the logs to prove it:
Here's the request to the first page of API results:
And here's the request to load the second page:
Note there is no session ID in the second result.
I think there may be a race condition, given the async/await nature of the program that may be causing the cookie to get clobbered by making a new request before the first one is completely done? I modified the code to 'save' the session_id
cookie if it was ever found and I am now injecting it manually to the CookieContainer
on subsequent requests if it is missing -- this causes the API scrapes to work as expected!
However, I'm now getting Forbidden
when the downloader attempts to download the content that may be embedded in the post. For me, this is a link to an external m3u8
file hosted on stream.mux.com
; I wonder if the same thing is happening there (where the downloader should be using session_id
but it is getting lost somehow).
I will keep testing...
Hm... I will take a look a bit later why this might be happening. Page parsing should be single threaded, so there shouldn't be any kind of race condition there.
As for the stream.mux.com thing - embedded audio/video content is not something I have tested or explicitly implemented, so I'm not sure what is needed for it to work properly. If I were to guess they might be checking if the origin and/or referer is set to patreon.com? Assuming this is functionality built into the patreon of course.
@clocklear Can I also ask you to do one more thing? I want to see the headers returned by the server in those requests.
DownloadStringInternal
before if (!responseMessage.IsSuccessStatusCode)
add the following code:
_logger.Info("Response headers:");
foreach (string headerString in responseMessage.Headers
.ToString()
.Split(new[] { "\r\n" }, StringSplitOptions.RemoveEmptyEntries))
{
_logger.Info(headerString);
}
@AlexCSDev here's the response headers on the last request that succeeded. The very next request logs out the CookieContainer
contents before the request is sent and it does not contain session_id
. Doesn't look like anything in this response is explicitly requesting the removal of the session though.
Response headers:
Date: Mon, 06 Jun 2022 17:59:10 GMT
cf-ray: 71730a4c4b95588a-IAD
Cache-Control: private
Set-Cookie: datadome=u6IHnZmWnRY9kJPvq8Jr_JVAnChBV.r8fOD4.J7ro-8upz8pM4idxVzPalZiRdOgWYs5GwVHdA5ymH4RXZ3-kyuRifwJUnlkrzwByqelU3YqNRBO~Z-rqUXIvJtJ3jO; Max-Age=31536000; Domain=.patreon.com; Path=/; Secure; SameSite=Lax
Strict-Transport-Security: max-age=2592000
cf-cache-status: DYNAMIC
accept-ch: Sec-CH-UA,Sec-CH-UA-Mobile,Sec-CH-UA-Platform,Sec-CH-UA-Arch,Sec-CH-UA-Full-Version-List,Sec-CH-UA-Model,Sec-CH-Device-Memory
Expect-CT: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
X-Content-Type-Options: nosniff
x-datadome: protected
x-patreon-uuid: b8ccf179-2956-5d95-8f5e-9b0326198c18
x-protected-by: Sqreen
report-to: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v3?s=oswafuHAvIRh1laOzbXCwA06NwIV1Qyr73cl%2Bkxzt9EjF8THYhtXN1nKPDrWn76iYvzAdVXv3Tz6ICZ72FUKWGaZX8uMax6nRl6Z8GQMdQb8B4OeuZJSLa838xOs%2BomsOQ%3D%3D"}],"group":"cf-nel","max_age":604800}
nel: {"success_fraction":0,"report_to":"cf-nel","max_age":604800}
Server: cloudflare
Was thinking more about the stream.mux.com thing and yeah, I agree with you -- feels like they've started doing some new referrer checking (because I've used this project for a couple months fine up to this point). I'll see if I can fake the referrer to see if it helps.
Hm... I wonder if the cookie expires or forced to be removed by its settings....
Let's try dumping complete cookie data, change the cookie printing code to this:
_logger.Info($"New request: {url}");
CookieCollection cookies = _httpClientHandler.CookieContainer.GetCookies(new Uri("https://patreon.com"));
foreach (Cookie cookie in cookies)
{
_logger.Info("===========");
_logger.Info("Cookie:");
_logger.Info($"{cookie.Name} = {cookie.Value}");
_logger.Info($"Domain: {cookie.Domain}");
_logger.Info($"Path: {cookie.Path}");
_logger.Info($"Port: {cookie.Port}");
_logger.Info($"Secure: {cookie.Secure}");
_logger.Info($"When issued: {cookie.TimeStamp}");
_logger.Info($"Expires: {cookie.Expires} (expired? {cookie.Expired})");
_logger.Info($"Don't save: {cookie.Discard}");
_logger.Info($"Comment: {cookie.Comment}");
_logger.Info($"Uri for comments: {cookie.CommentUri}");
_logger.Info($"Version: RFC {(cookie.Version == 1 ? 2109 : 2965)}");
_logger.Info($"String: {cookie}");
}
Cookie:
session_id = XXXXXXXXXX
Domain: .patreon.com
Path: /
Port:
Secure: False
When issued: 6/6/2022 2:17:25 PM
Expires: 1/1/0001 12:00:00 AM (expired? False)
Don't save: False
Comment:
Uri for comments:
Version: RFC 2965
String: session_id=XXXXXXXXXX
Value obfuscated for obvious reasons.
FWIW, its settings don't appear to be any different than any other cookie's settings.
Ok, one last thing to try:
Replace the function with this. During normal operation it should print this:
2022-06-06 21:25:54.6217 INFO New request: xxxxxxxx
2022-06-06 21:25:54.6217 INFO Session ID exists before the request
2022-06-06 21:25:55.6046 INFO Session ID exists after requesting headers
2022-06-06 21:25:55.6046 INFO Session ID exists after requesting content
private async Task<string> DownloadStringInternal(string url, int retry = 0, int retryTooManyRequests = 0)
{
if (retry > 0)
{
if (retry >= _maxRetries)
{
throw new DownloadException("Retries limit reached");
}
await Task.Delay(retry * _retryMultiplier * 1000);
}
if (retryTooManyRequests > 0)
await Task.Delay(retryTooManyRequests * _retryMultiplier * 1000);
try
{
_logger.Info($"New request: {url}");
CookieCollection cookies = _httpClientHandler.CookieContainer.GetCookies(new Uri("https://patreon.com"));
foreach (Cookie cookie in cookies)
{
if(cookie.Name == "session_id")
_logger.Info("Session ID exists before the request");
}
using (var request = new HttpRequestMessage(HttpMethod.Get, url) {Version = _httpVersion})
{
//Add some additional headers to better mimic a real browser
request.Headers.Add("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8");
request.Headers.Add("Accept-Language", "en-US,en;q=0.5");
request.Headers.Add("Cache-Control", "no-cache");
request.Headers.Add("DNT", "1");
using (HttpResponseMessage responseMessage =
await _httpClient.SendAsync(request, HttpCompletionOption.ResponseHeadersRead))
{
cookies = _httpClientHandler.CookieContainer.GetCookies(new Uri("https://patreon.com"));
foreach (Cookie cookie in cookies)
{
if (cookie.Name == "session_id")
_logger.Info("Session ID exists after requesting headers");
}
if (!responseMessage.IsSuccessStatusCode)
{
switch (responseMessage.StatusCode)
{
case HttpStatusCode.BadRequest:
case HttpStatusCode.Unauthorized:
case HttpStatusCode.Forbidden:
case HttpStatusCode.NotFound:
case HttpStatusCode.MethodNotAllowed:
case HttpStatusCode.Gone:
throw new DownloadException($"Error status code returned: {responseMessage.StatusCode}",
responseMessage.StatusCode, await responseMessage.Content.ReadAsStringAsync());
case HttpStatusCode.Moved:
case HttpStatusCode.Found:
case HttpStatusCode.SeeOther:
case HttpStatusCode.TemporaryRedirect:
case HttpStatusCode.PermanentRedirect:
string newLocation = responseMessage.Headers.Location.ToString();
_logger.Debug(
$"{url} has been moved to: {newLocation}, retrying using new url");
return await DownloadStringInternal(newLocation);
case HttpStatusCode.TooManyRequests:
retryTooManyRequests++;
_logger.Debug(
$"Too many requests for {url}, waiting for {retryTooManyRequests * _retryMultiplier} seconds...");
return await DownloadStringInternal(url, 0, retryTooManyRequests);
}
retry++;
_logger.Debug(
$"{url} returned status code {responseMessage.StatusCode}, retrying in {retry * _retryMultiplier} seconds ({_maxRetries - retry} retries left)...");
return await DownloadStringInternal(url, retry);
}
string retVal = await responseMessage.Content.ReadAsStringAsync();
cookies = _httpClientHandler.CookieContainer.GetCookies(new Uri("https://patreon.com"));
foreach (Cookie cookie in cookies)
{
if (cookie.Name == "session_id")
_logger.Info("Session ID exists after requesting content");
}
return retVal;
}
}
}
catch (TaskCanceledException ex)
{
retry++;
_logger.Debug(ex,
$"Encountered timeout error while trying to access {url}, retrying in {retry * _retryMultiplier} seconds ({_maxRetries - retry} retries left)... The error is: {ex}");
return await DownloadStringInternal(url, retry);
}
catch (IOException ex)
{
retry++;
_logger.Debug(ex,
$"Encountered IO error while trying to access {url}, retrying in {retry * _retryMultiplier} seconds ({_maxRetries - retry} retries left)... The error is: {ex}");
return await DownloadStringInternal(url, retry);
}
catch (SocketException ex)
{
retry++;
_logger.Debug(ex,
$"Encountered connection error while trying to access {url}, retrying in {retry * _retryMultiplier} seconds ({_maxRetries - retry} retries left)... The error is: {ex}");
return await DownloadStringInternal(url, retry);
}
catch (DownloadException ex)
{
throw;
}
catch (Exception ex)
{
throw new DownloadException($"Unable to retrieve data from {url}: {ex.Message}", ex);
}
}
2022-06-06 15:18:43.2251 DEBUG [PatreonDownloader.Implementation.PatreonPageCrawler] Page #2: xxxxx
2022-06-06 15:18:44.2321 INFO [UniversalDownloaderPlatform.DefaultImplementations.WebDownloader] New request: xxxxx
2022-06-06 15:18:44.2321 INFO [UniversalDownloaderPlatform.DefaultImplementations.WebDownloader] Session ID exists before the request
2022-06-06 15:18:45.0792 INFO [UniversalDownloaderPlatform.DefaultImplementations.WebDownloader] Session ID exists after requesting headers
2022-06-06 15:18:45.0792 INFO [UniversalDownloaderPlatform.DefaultImplementations.WebDownloader] Session ID exists after requesting content
...
...
2022-06-06 15:18:45.7023 DEBUG [PatreonDownloader.Implementation.PatreonPageCrawler] Page #3: xxxxx
2022-06-06 15:18:46.7088 INFO [UniversalDownloaderPlatform.DefaultImplementations.WebDownloader] New request: xxxxx
2022-06-06 15:18:46.7088 INFO [UniversalDownloaderPlatform.DefaultImplementations.WebDownloader] Session ID exists before the request
2022-06-06 15:18:47.5318 DEBUG [PatreonDownloader.Implementation.PatreonPageCrawler] Parsing data entries...
Something is destroying the session_id
value. I don't think it is happening based on server response; if they were destroying the session server side, I shouldn't be able to patch in the existing value and have my request complete (which totally works).
I'm just spit-balling here so excuse me if it's way off the mark...
I'm seeing cookies set on both .patreon.com and the sub www.patreon.com and I'm not familiar with the CookieContainer class, so are we sure this is actually retrieving all cookies? cookies = _httpClientHandler.CookieContainer.GetCookies(new Uri("https://patreon.com"));
There also appears to be a longstanding .net bug about retrieving cookie's for .domain that sounds like it might be relevant.
In my logging, I see cookies for domain .patreon.com
as well as patreon.com
. These are found by requesting, explicitly, the cookies for https://patreon.com
, so your theory is a a good guess @TheQwerty, but I don't (currently) think that's what is going on here.
I don't think any relevant cookies are being set on www subdomain. session_id cookie is being set on .patreon.com and the api itself lives on the root domain as well, so all cookies retrieved for root domain should apply to api requests.
To me this sounds like some kind of a bug somewhere in .NET's http request pipeline.
I don't really see how it can remove the cookie yet. https://github.com/dotnet/runtime/blob/6a984143635bde23e728abaaccbde52f5ea8fa3e/src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/Http2Stream.cs#L889
I'm thinking of implementing my own cookie management class instead of relying on HttpClient's built-in cookie management. But that will take some time to implement properly.
Hey, getting an Error recently any idea how to fix it?
2022-05-14 13:52:50.0560 DEBUG [PatreonDownloader.Implementation.PatreonPageCrawler] Page #4: https://www.patreon.com/api/posts?include=user%2Cattachments%2Ccampaign%2Cpoll.choices%2Cpoll.current_user_responses.user%2Cpoll.current_user_responses.choice%2Cpoll.current_user_responses.poll%2Caccess_rules.tier.null%2Cimages.null%2Caudio.null&fields%5Bpost%5D=change_visibility_at%2Ccomment_count%2Ccontent%2Ccurrent_user_can_delete%2Ccurrent_user_can_view%2Ccurrent_user_has_liked%2Cembed%2Cimage%2Cis_paid%2Clike_count%2Cmin_cents_pledged_to_view%2Cpost_file%2Cpost_metadata%2Cpublished_at%2Cpatron_count%2Cpatreon_url%2Cpost_type%2Cpledge_url%2Cthumbnail_url%2Cteaser_text%2Ctitle%2Cupgrade_url%2Curl%2Cwas_posted_by_campaign_owner&fields%5Buser%5D=image_url%2Cfull_name%2Curl&fields%5Bcampaign%5D=show_audio_post_download_links%2Cavatar_photo_url%2Cearnings_visibility%2Cis_nsfw%2Cis_monthly%2Cname%2Curl&fields%5Baccess_rule%5D=access_rule_type%2Camount_cents&fields%5Bmedia%5D=id%2Cimage_urls%2Cdownload_url%2Cmetadata%2Cfile_name&sort=-published_at&filter%5Bis_draft%5D=false&filter%5Bcontains_exclusive_posts%5D=true&json-api-use-default-includes=false&json-api-version=1.0&filter%5Bcampaign_id%5D=3133042&page%5Bcursor%5D=01SUSjbQm6uGXMGHMnHbaLxrQ_ 2022-05-14 13:52:50.3300 FATAL [PatreonDownloader.App.Program] Fatal error, application will be closed: UniversalDownloaderPlatform.Common.Exceptions.DownloadException: Error status code returned: BadRequest at UniversalDownloaderPlatform.DefaultImplementations.WebDownloader.DownloadStringInternal(String url, Int32 retry, Int32 retryTooManyRequests) in F:\Sources\BigProjects\PatreonDownloader\submodules\UniversalDownloaderPlatform\UniversalDownloaderPlatform.DefaultImplementations\WebDownloader.cs:line 323 at UniversalDownloaderPlatform.DefaultImplementations.WebDownloader.DownloadString(String url) in F:\Sources\BigProjects\PatreonDownloader\submodules\UniversalDownloaderPlatform\UniversalDownloaderPlatform.DefaultImplementations\WebDownloader.cs:line 288 at PatreonDownloader.Implementation.PatreonWebDownloader.DownloadString(String url) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.Implementation\PatreonWebDownloader.cs:line 55 at PatreonDownloader.Implementation.PatreonWebDownloader.DownloadString(String url) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.Implementation\PatreonWebDownloader.cs:line 73 at PatreonDownloader.Implementation.PatreonPageCrawler.Crawl(ICrawlTargetInfo crawlTargetInfo, String downloadDirectory) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.Implementation\PatreonPageCrawler.cs:line 84 at UniversalDownloaderPlatform.Engine.UniversalDownloader.Download(String url, String downloadDirectory, IUniversalDownloaderPlatformSettings settings) in F:\Sources\BigProjects\PatreonDownloader\submodules\UniversalDownloaderPlatform\UniversalDownloaderPlatform.Engine\UniversalDownloader.cs:line 198 at PatreonDownloader.App.Program.RunPatreonDownloader(CommandLineOptions commandLineOptions) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.App\Program.cs:line 143 at PatreonDownloader.App.Program.Main(String[] args) in F:\Sources\BigProjects\PatreonDownloader\PatreonDownloader.App\Program.cs:line 69