AlexCSDev / PatreonDownloader

Powerful tool for downloading content posted by creators on patreon.com. Supports content hosted on patreon itself as well as external sites (additional plugins might be required).
MIT License
1k stars 99 forks source link

Error while downloading https://www.patreon.com/file?h=37188912&i=7097946: [45879210] Unable to retrieve name for external entry of type ExternalUrl: https://www.patreon.com/file?h=37188912&i=7097946 #48

Open meliamne opened 3 years ago

meliamne commented 3 years ago

Getting a bunch of these instead of the zip files hoped for, Normal download via browser works.

Error while downloading https://www.patreon.com/file?h=37188912&i=7097946: [45879210] Unable to retrieve name for external entry of type ExternalUrl: https://www.patreon.com/file?h=37188912&i=7097946

AlexCSDev commented 3 years ago

Please run app with "--json" command line option and send generated .json files (you can find them in the download folder) to alexcsdev@protonmail.com

zeldatp151 commented 3 years ago

I wanted to add that I am having this same issue.

AlexCSDev commented 3 years ago

I wanted to add that I am having this same issue.

Please send the json dump of the affected creator to the email above. I will check it when I have free time.

begna112 commented 3 years ago

I am also getting this error. Seems to be related to when the creator adds a link to the content via the patreon uploader, rather than as a proper attachment. Not sure if this is some extended functionality patreon added in the past year or just some creators being weird with how they do it.

In particular, this is for zip files, usually containing a bunch of images, psd files, etc.

I'll email the json files to the email indicated above, but seems to boil down to this:

                "content": "<p><strong>42 region icons in elven style </strong>for marking the important locations on your Elven Kingdom!</p><ul> <li>2 Color variations</li> <li>Towns, villages, castles, keeps and more</li> <li>Standard markers for important quest locations and heroes' party position.</li></ul><p><a href=\"https://www.patreon.com/file?h=35395828&amp;i=9247121\">Elven Region Icons [ALL PATRONS]</a>\u00a0</p>",

                ... 

                "attachments": {
                    "data": []
                },
AlexCSDev commented 3 years ago

@begna112 Interesting... Yep, send me the files, I will definitely look into that once I have some free time.

begna112 commented 3 years ago

Sent them along.

For what it's worth, I'm 90% certain the original report here is the same. A bunch of in-line links to zip files for battlemaps, rather than proper attachments. https://www.patreon.com/posts/elven-ruins-45879210

Might need to specifically validate in-post links of this Patreon url format as not being "ExternalUrl" types and how to handle them.

begna112 commented 2 years ago

Any updates on this?

syntholly commented 2 years ago

Just adding to this, having the same problem. I can open the link myself fine and download the file through the browser by just visiting it, but PatreonDownloader doesn't seem to follow through.

AlexCSDev commented 2 years ago

I've been quite busy lately. I'm afraid I can't provide any ETA for this right now.

jhons434 commented 2 years ago

Hi, any updates on this? I've been trying to download battlemaps as mentioned above and any Patreon file links aren't working.

I realize there is no estimation, but just wanted to check in to see if this was still on your radar.

I tried taking a look myself but was having trouble figuring out the project architecture. If I'm understanding correctly though, it may be just as simple as parsing the in-line links of this format through the same process as the attached files, since they seem to use the same url format.

AlexCSDev commented 2 years ago

Unfortunately the harsh truth is that if the issue does not affect me or does not completely break the application the answer is "this will be done when I fell like doing it unless someone is willing to pay for it".

I have spent quite a lot of my free time working on #125, so for now I can't dedicate any more of it to this project unless something breaks completely.

begna112 commented 1 year ago

Unfortunately the harsh truth is that if the issue does not affect me or does not completely break the application the answer is "this will be done when I fell like doing it unless someone is willing to pay for it".

Pay for it? How and how much? @AlexCSDev

Edit: Actually, trying to determine if this issue still exists.

Edit 2: yes, it still exists. example - https://www.patreon.com/posts/83764955 https://www.patreon.com/posts/souls-plane-83764955

2023-06-08 02:47:09.4627 ERROR Failed to download https://www.patreon.com/file?h=35017899&i=14336710: Error while downloading https://www.patreon.com/file?h=35017899&i=14336710: [83764955] Unable to retrieve name for external entry of type ExternalUrl: https://www.patreon.com/file?h=35017899&i=14336710
begna112 commented 1 year ago

So here is where the error is being thrown: https://github.com/AlexCSDev/PatreonDownloader/blob/aaaaf9291c513912eb46aba9a8b4c6646972401f/PatreonDownloader.Implementation/PatreonCrawledUrlProcessor.cs#L113-L120

Here is where any url inside the content of the post is being set as an "externalurl": https://github.com/AlexCSDev/PatreonDownloader/blob/aaaaf9291c513912eb46aba9a8b4c6646972401f/PatreonDownloader.Implementation/PatreonPageCrawler.cs#L216-L224 This probably should be altered to detect patreon links following the https://www.patreon.com/file pattern and set to PostAttachment. But I'm not certain that this would solve the issue.

This isn't set to allow redirects: https://github.com/AlexCSDev/PatreonDownloader/blob/aaaaf9291c513912eb46aba9a8b4c6646972401f/PatreonDownloader.Implementation/PatreonRemoteFilenameRetriever.cs#L26 https://github.com/AlexCSDev/PatreonDownloader/blob/aaaaf9291c513912eb46aba9a8b4c6646972401f/PatreonDownloader.Implementation/PatreonRemoteFilenameRetriever.cs#L49-L68 I think this is most likely where the bug lies. The https://www.patreon.com/file links are a 302 redirect to a url like https://c10.patreonusercontent.com/4/patreon-media/p/post/27184633/3d5db0f1843844a883ad68643ed924b2/eyJhIjoxLCJwIjoxfQ%3D%3D/1?token-time=1686528000&token-hash=.

The HttpClient receives a proper response, so it isn't throwing an HTTPRequestException but it also doesn't have a Content-Disposition, so it never populates the filename variable with anything other than null.

If the HttpClient were to follow the redirect, it would have a Content-Disposition and would be able to retrieve the filename.

I think this could be potentially solved by allowing the redirect like this: https://briancaos.wordpress.com/2021/09/06/httpclient-follow-302-redirects-with-net-core/

The only thing I'm not really sure of is why this regex isn't matching on the url: https://github.com/AlexCSDev/PatreonDownloader/blob/aaaaf9291c513912eb46aba9a8b4c6646972401f/PatreonDownloader.Implementation/PatreonRemoteFilenameRetriever.cs#L24

if it did match, it would at least get assigned a filename based on the irl here: https://github.com/AlexCSDev/PatreonDownloader/blob/aaaaf9291c513912eb46aba9a8b4c6646972401f/PatreonDownloader.Implementation/PatreonRemoteFilenameRetriever.cs#L75

Edit: figured out that the regex is trying to detect a filename pattern in the url, not a valid url. So, that makes sense, though it's misleading with the comment saying it's an "invalid url"

begna112 commented 1 year ago

I tried to implement this myself but am getting a 403 error from cloudflare. Maybe that was the root problem all along? I added the redirect and a debug line to output the request and the response to strings. I think that the request is missing cookies and needs to use the IWebDownloader cookies that are used elsewhere. I don't understand .net enough to know how to get the cookies you're saving elsewhere into this client.

2023-06-08 06:01:13.9540 DEBUG [PatreonDownloader.Implementation.PatreonRemoteFilenameRetriever] Method: GET, RequestUri: 'https://www.patreon.com/file?h=35017899&i=14336710', Version: 1.1, Content: <null>, Headers:
{
}
2023-06-08 05:27:52.2296 DEBUG [PatreonDownloader.Implementation.PatreonRemoteFilenameRetriever] StatusCode: 403, ReasonPhrase: 'Forbidden', Version: 1.1, Content: System.Net.Http.HttpConnectionResponseContent, Headers:
{
  Date: Thu, 08 Jun 2023 10:27:52 GMT
  Transfer-Encoding: chunked
  Connection: keep-alive
  CF-Ray: 7d4071db98ec2cb0-DFW
  CF-Cache-Status: DYNAMIC
  Cache-Control: private
  Set-Cookie: a_csrf=WjQaO_rQQbeZjmgcSlV4Eiez2A8uYxNKrcFBpeXr0mE; Domain=patreon.com; Expires=Thu, 08-Jun-2023 11:27:52 GMT; Max-Age=3600; Secure; HttpOnly; Path=/
  Set-Cookie: patreon_locale_code=en-US; Domain=patreon.com; Expires=Wed, 03-Jun-2043 10:27:52 GMT; Max-Age=630720000; Secure; Path=/
  Set-Cookie: patreon_location_country_code=US; Domain=patreon.com; Expires=Wed, 03-Jun-2043 10:27:52 GMT; Max-Age=630720000; Secure; Path=/
  Set-Cookie: patreon_device_id=889737a2-f4c7-4cbc-979a-e044ae0d07e8; Domain=patreon.com; Expires=Thu, 01-Aug-2040 00:00:00 GMT; Max-Age=630720000; Path=/
  Set-Cookie: patreon_location_country_code=US; Domain=patreon.com; Expires=Thu, 01-Aug-2040 00:00:00 GMT; Max-Age=630720000; Path=/
  Set-Cookie: patreon_locale_code=undefined; Domain=patreon.com; Expires=Thu, 01-Aug-2040 00:00:00 GMT; Max-Age=630720000; Path=/
  Set-Cookie: __cf_bm=CcfmoX8LOfw24DqxuaAH7bVvRy0NnkhnlUlYDexoK8Y-1686220072-0-AWPN37kHbX1/PyyMbLufWbudZ68cLXrttr3B5Abw/499Qw2C8wwEZ4DvggLeSuQbAMtdR2dLLw8uyb2rMix4xXwdDqIJlNrIPwc8z7trXwE7; path=/; expires=Thu, 08-Jun-23 10:57:52 GMT; domain=.patreon.com; HttpOnly; Secure
  Strict-Transport-Security: max-age=2592000
  Referrer-Policy: origin,strict-origin-when-cross-origin
  X-Content-Type-Options: nosniff
  X-Frame-Options: SAMEORIGIN
  x-patreon-sha: ab2f439ffbd53097445c4ae0fd8f2ac3e4ccaee6
  x-patreon-uuid: 72c3cb25-18b5-5248-8a77-3da75c089e1a
  X-XSS-Protection: 1; mode=block
  Report-To: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v3?s=ReQXP6fwUVt6q6Ap82mKAp0zEn0KG3MMoVAvnJYwigPr2s9QjaLkU0R4rIc30Po1M0BsIzxe3Kwcdv666me8H%2FjtHmooA25rin0HLAs%2BT2MKKNYrzBt6J8x0zoatiZRZVg%3D%3D"}],"group":"cf-nel","max_age":604800}
  NEL: {"success_fraction":0,"report_to":"cf-nel","max_age":604800}
  Server: cloudflare
  Content-Type: text/html; charset=utf-8
  Content-Language: en-US
}
2023-06-08 05:27:52.2296 ERROR [UniversalDownloaderPlatform.Engine.DownloadManager] Error while downloading https://www.patreon.com/file?h=35017899&i=14336710: [83764955] Unable to retrieve name for external entry of type ExternalUrl: https://www.patreon.com/file?h=35017899&i=14336710
2023-06-08 05:27:52.2296 ERROR [PatreonDownloader.App.Program] Failed to download https://www.patreon.com/file?h=35017899&i=14336710: Error while downloading https://www.patreon.com/file?h=35017899&i=14336710: [83764955] Unable to retrieve name for external entry of type ExternalUrl: https://www.patreon.com/file?h=35017899&i=14336710

an expected response should be something like

Request URL:
https://www.patreon.com/file?h=35017899&i=14336710
Request Method:
GET
Status Code:
302
Remote Address:
104.16.7.49:443
Referrer Policy:
strict-origin-when-cross-origin
Cache-Control:
private
Cf-Cache-Status:
DYNAMIC
Cf-Ray:
7d405ca40d2ee7cf-DFW
Content-Language:
en-US
Content-Type:
text/html; charset=utf-8
Date:
Thu, 08 Jun 2023 10:13:23 GMT
Location:
https://c10.patreonusercontent.com/4/patreon-media/p/post/35017899/ccf1e1c2b1164c07b3ab347dbcc6596c/eyJhIjoxLCJwIjoxfQ%3D%3D/1?token-time=1686528000&token-hash=16wLo1nHMI5dkNuXI6SrylYXSiumTq6lt10XuJBfZ_I%3D
Nel:
{"success_fraction":0,"report_to":"cf-nel","max_age":604800}
Referrer-Policy:
origin,strict-origin-when-cross-origin
Report-To:
{"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v3?s=azzB4wZoQTCtmVdgukCWEX0T9vWxyOicBRQBValw8h%2FMXeJ%2Bbu3hiE%2F0e044S%2FwjXpbWDfj3loTczb2lh853aXbkeT9MR1edmsRfRRBGMvTLLrNaaKaU%2BBaY60Dku27oyw%3D%3D"}],"group":"cf-nel","max_age":604800}
Server:
cloudflare
Set-Cookie:
AWSALBTG=bzdf7KmRnw6p+uOnZEsPjuj4WChaERMK6eidGYPxCRQ4PwA5rEzKsOJCEJ9VT96zJEUqYT0XjsZl7SlxkledGguHEzbCNXd9G+2V5bECfuYD9mHOFa5F9SZFxHhJYRJZYGLG1jRfeZpYV9iE0kt0V4psnxltaHoKSqGiI/CIUEN8soJn2RZXA5EOVUnVc74Z8uO8IPdrjLEm6H1tQtoiUYSsxGK7HrPcWAZhKT795RgrAu26+qVtFsvU5DP70rQI5BAqAeg=; Expires=Thu, 15 Jun 2023 10:13:23 GMT; Path=/
Set-Cookie:
AWSALBTGCORS=bzdf7KmRnw6p+uOnZEsPjuj4WChaERMK6eidGYPxCRQ4PwA5rEzKsOJCEJ9VT96zJEUqYT0XjsZl7SlxkledGguHEzbCNXd9G+2V5bECfuYD9mHOFa5F9SZFxHhJYRJZYGLG1jRfeZpYV9iE0kt0V4psnxltaHoKSqGiI/CIUEN8soJn2RZXA5EOVUnVc74Z8uO8IPdrjLEm6H1tQtoiUYSsxGK7HrPcWAZhKT795RgrAu26+qVtFsvU5DP70rQI5BAqAeg=; Expires=Thu, 15 Jun 2023 10:13:23 GMT; Path=/; SameSite=None; Secure
Set-Cookie:
patreon_locale_code=en-US; Domain=patreon.com; Expires=Wed, 03-Jun-2043 10:13:23 GMT; Max-Age=630720000; Secure; Path=/
Set-Cookie:
patreon_location_country_code=US; Domain=patreon.com; Expires=Wed, 03-Jun-2043 10:13:23 GMT; Max-Age=630720000; Secure; Path=/
Strict-Transport-Security:
max-age=2592000
Vary:
Accept-Encoding
X-Content-Type-Options:
nosniff
X-Frame-Options:
sameorigin
X-Patreon-Sha:
ab2f439ffbd53097445c4ae0fd8f2ac3e4ccaee6
X-Patreon-Uuid:
7adf1842-eb32-527b-a645-50c9678a3895
X-Xss-Protection:
1; mode=block

I noticed that the request headers (from my browser) include cookies:

:Authority:
www.patreon.com
:Method:
GET
:Path:
/file?h=35017899&i=14336710
:Scheme:
https
Accept:
text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7
Accept-Encoding:
gzip, deflate, br
Accept-Language:
en-US,en;q=0.9
Cookie:
__cf_bm=wPLhS6I9EmMSDDTYVSqRdXYpz.T3.C9HzVF1zYwP1Ik-1686218911-0-ATMa3ycREa6K9rsFAmWDRDkIXcIqF6U5VDqUuNuNqz6Jb/4PEGQ9cJanuHwVgbXsykRE7W+L; patreon_device_id=cd9eb84d-8c95-47ab-8e79-71d; patreon_location_country_code=US; patreon_locale_code=en-US; _ALGOLIA=anonymous-85a84d3e-ca44-458e-980a-d3a; a_csrf=dsKcV3FQJLkuW0ZLXnD20FmiEdh5-alUE; session_id=wb6w3_4fmVwDsETyDWFSGtmbyqX0; _swb_consent_=eyJlbnZpcm9ubWVudENvZGUiOiJwcm9kdWN0aW9uIiwiaWRlbnRpdGllcyI6eyJwYXRyZW9uYWNjdGlkIjoiMzkzMDE4NSIsInBhdHJlb25kZXZpY2VpZCI6ImNkOWViODRkLThjOTUtNDdhYi04ZTc5LTcxMTM2NmE0YjY2ZCJ9LCJqdXJpc2RpY3Rpb25Db2RlIjoidXNnZW5lcmFsIiwicHJvcGVydHlDb2RlIjoicGF0cmVvbiIsInB1cnBvc2VzIjp7ImFuYWx5dGljc2JpemVuaGFuY2UiOnsiYWxsb3dlZCI6InRydWUiLCJsZWdhbEJhc2lzQ29kZSI6ImRpc2Nsb3N1cxhd3MiOnsiYWxsb3dlZCI6InRydWUiLCJsZWdhbEJhc2lzQ29kZSI6ImRpc2Nsb3N1cmUifSwic3Vic2NyaWJlZHN2Y3MiOnsiYWxsb3dlZCI6InRydWUiLCJsZWdhbEJhc2lzQ29kZSI6ImRpc2Nsb3N1cmUifSwic3VydmV5b3V0cmVhY2giOnsiYWxsb3dlZCI6InRydWUiLCJsZWdhbEJhc2lzQ29kZSI6ImNvbnNlbnRfb3B0b3V0In0sInRhcmdldGVkYWR2ZXJ0aXNpbmciOnsiYWxsb3dlZCI6InRydWUiLCJsZWdhbEJhc2lzQ29kZSI6ImRpc2Nsb3N1cmUifX0sImNvbGxlY3RlZEF0IjoxNjg2MjE4OTIzfQ%3D%3D; AWSALBTG=8Z7SPHtypB1WzoroJ8+kkQdOPRT1YjwfE3E5GEXU+i9jeeV0gFX+cn5B/2nlWybN9jfgHH6YBXNUIJh2QRoz55UkNLZQbheu4O6hyBSjprx5yJva02kYFml3KJT4TvFsb+GAyFSUzQdCiK76mT3pWV1ziqcqT6fT0xgYC7ZjnVjl5HBWTFhgob8GXXdjkMEEM67OcpLOPhG0GvRDr2huFmPjp5w0tp9IUSmXCDX3E5GEXU+i9jeeV0gFX+cn5B/2nlWybN9jfgHH6YBXNUIJh2QRoz55UkNLZQbheu4O6hyBSjprx5yJva02kYFml3KJT4TvFsb+GAyFSUzQdCiK76mT3pWV1ziqcqT6fT0xgYC7ZjnVjl5HBWTFhgob8GXXdjkMEEM67OcpLOPhG0GvRDr2huFmPjp5w0tp9IUSmXCDXD3e1Xvmj2QwraaQqhbUeVVkRAMC4YUNE=
Referer:
https://www.patreon.com/posts/souls-plane-83764955
Sec-Ch-Device-Memory:
8
Sec-Ch-Ua:
"Not.A/Brand";v="8", "Chromium";v="114", "Google Chrome";v="114"
Sec-Ch-Ua-Arch:
"x86"
Sec-Ch-Ua-Full-Version-List:
"Not.A/Brand";v="8.0.0.0", "Chromium";v="114.0.5735.118", "Google Chrome";v="114.0.5735.118"
Sec-Ch-Ua-Mobile:
?0
Sec-Ch-Ua-Model:
""
Sec-Ch-Ua-Platform:
"Windows"
Sec-Fetch-Dest:
document
Sec-Fetch-Mode:
navigate
Sec-Fetch-Site:
same-origin
Sec-Fetch-User:
?1
Upgrade-Insecure-Requests:
1
User-Agent:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36