Open luckybear992 opened 1 year ago
I also cannot download anything via Imgur, regardless of file type despite the link working as intended in the browser.
Except for me the error is (for every download):
Site Imgur failed to download submission xxxxxx: server responded with 404 to https://api.imgur.com/3/image/yyyyyyy
Can confirm as well, that I too can't download anything from the imgur.
Yes and navigating to the link in a browser will unveil the reason: error | "Authentication required" It would seem the API has been gated. The error reported by BDFR is a 404, even though the actual error is a 401. This might be a bug in the code, unrelated to this issue.
Yes and navigating to the link in a browser will unveil the reason: error | "Authentication required" It would seem the API has been gated. The error reported by BDFR is a 404, even though the actual error is a 401. This might be a bug in the code, unrelated to this issue.
The reason you're getting 401 from that link is the same reason I mention in #828 you're missing the auth headers to access that API link.
As for the rest of the issue at hand here, There are a lot of things being removed from Imgur right now. It seems they're being removed from the API first and the direct file links will sometimes work for a bit afterwards. You can work around this for direct links with an edit to the download_factory but I would not advise it long term as any dead link will just pick up the removed image and treat it like it's been successful. Also any malformed links provided by the Reddit API can just download the HTML of the 404 page as the downloader will not see the redirect and think it's getting the right file. It's the main reason the change to the API was made in the first place.
If you are willing to run with those caveats or are willing to double-check them all here is the patch:
change this:
if re.match(r"(i\.|m\.|o\.)?imgur", sanitised_url):
return Imgur
elif re.match(r"(i\.|thumbs\d{1,2}\.|v\d\.)?(redgifs|gifdeliverynetwork)", sanitised_url):
return Redgifs
elif re.match(r"(thumbs\.|giant\.)?gfycat\.", sanitised_url):
return Gfycat
elif re.match(r".*/.*\.[a-zA-Z34]{3,4}(\?[\w;&=]*)?$", sanitised_url) and not DownloadFactory.is_web_resource(
sanitised_url
):
return Direct
to this:
if re.match(r"(i\.|thumbs\d{1,2}\.|v\d\.)?(redgifs|gifdeliverynetwork)", sanitised_url):
return Redgifs
elif re.match(r"(thumbs\.|giant\.)?gfycat\.", sanitised_url):
return Gfycat
elif re.match(r".*/.*\.[a-zA-Z34]{3,4}(\?[\w;&=]*)?$", sanitised_url) and not DownloadFactory.is_web_resource(
sanitised_url
):
return Direct
elif re.match(r"(i\.|m\.|o\.)?imgur", sanitised_url):
return Imgur
Any gifv links will download as such with that change. If you would like them downloaded as mp4 you can insert the two new lines to downloader at line 96:
try:
if submission.url.endswith(".gifv"):
submission.url = submission.url.replace(".gifv", ".mp4")
downloader_class = DownloadFactory.pull_lever(submission.url)
These edits are provided as-is and I won't be providing additional support for them.
Yes and navigating to the link in a browser will unveil the reason: error | "Authentication required" It would seem the API has been gated. The error reported by BDFR is a 404, even though the actual error is a 401. This might be a bug in the code, unrelated to this issue.
The reason you're getting 401 from that link is the same reason I mention in #828 you're missing the auth headers to access that API link.
As for the rest of the issue at hand here, There are a lot of things being removed from Imgur right now. It seems they're being removed from the API first and the direct file links will sometimes work for a bit afterwards. You can work around this for direct links with an edit to the download_factory but I would not advise it long term as any dead link will just pick up the removed image and treat it like it's been successful. Also any malformed links provided by the Reddit API can just download the HTML of the 404 page as the downloader will not see the redirect and think it's getting the right file. It's the main reason the change to the API was made in the first place.
If you are willing to run with those caveats or are willing to double-check them all here is the patch:
change this:
if re.match(r"(i\.|m\.|o\.)?imgur", sanitised_url): return Imgur elif re.match(r"(i\.|thumbs\d{1,2}\.|v\d\.)?(redgifs|gifdeliverynetwork)", sanitised_url): return Redgifs elif re.match(r"(thumbs\.|giant\.)?gfycat\.", sanitised_url): return Gfycat elif re.match(r".*/.*\.[a-zA-Z34]{3,4}(\?[\w;&=]*)?$", sanitised_url) and not DownloadFactory.is_web_resource( sanitised_url ): return Direct
to this:
if re.match(r"(i\.|thumbs\d{1,2}\.|v\d\.)?(redgifs|gifdeliverynetwork)", sanitised_url): return Redgifs elif re.match(r"(thumbs\.|giant\.)?gfycat\.", sanitised_url): return Gfycat elif re.match(r".*/.*\.[a-zA-Z34]{3,4}(\?[\w;&=]*)?$", sanitised_url) and not DownloadFactory.is_web_resource( sanitised_url ): return Direct elif re.match(r"(i\.|m\.|o\.)?imgur", sanitised_url): return Imgur
Any gifv links will download as such with that change. If you would like them downloaded as mp4 you can insert the two new lines to downloader at line 96:
try: if submission.url.endswith(".gifv"): submission.url = submission.url.replace(".gifv", ".mp4") downloader_class = DownloadFactory.pull_lever(submission.url)
These edits are provided as-is and I won't be providing additional support for them.
Oh i understand now. Some of the submissions where very recent so I hadn't considered they could already be removed.
or are willing to double-check them all here is
@OMEGARAZER
Is there a way to figure out which files need to be double checked? Then a way to save the corresponding file to the right location, named and all?
@AlexTu2
or are willing to double-check them all here is
@OMEGARAZER
Is there a way to figure out which files need to be double checked? Then a way to save the corresponding file to the right location, named and all?
bdfr has the --no-dupes option that promises to avoid downloading the same image/video twice by comparing hashes. Since the 'removed' image is the same every time, that option catches it. You'll just get one of them and bdfr will skip all other posts that were removed by imgur.
I'm currently re-downloading my saved posts with this fix and the --no-dupes option, the log displays "Resource hash d835884373f4d6c8f24742ceabe74946 from submission
Plus the images are all exactly the same (absurdly low) size. It's easy to use a tool like find to get them all.
Description
imgur links keep giving a 404 error even though they work on my browser. An imgur link such as https://i.imgur.com/xxxxxx.gifv opens up on my browser. https://i.imgur.com/xxxxxx WITHOUT the gifv extension loads a 404 page. The two 404 links in the log I provided work fine on my browser using the i.imgur link that ends with .gifv extension
Command
Environment
Logs