HoverHell / RedditImageGrab

Downloads images from sub-reddits of reddit.com.
GNU General Public License v3.0
311 stars 78 forks source link

[Question] What sites can this download from? #56

Closed jtara1 closed 8 years ago

jtara1 commented 8 years ago

Hello, I'm working on a Python 3 fork of RedditImageGrab.

Given that most subreddits are built up by any user's contributions, any reddit submission could be any link, however, most links are Imgur, Reddit Image, or Gfycat hosted.

It appears there are classes and functions specifically for DeviantArt, but I recall at some points my fork failed to download any images from DeviantArt.

Gfycat & Reddit Images seem to work as expected.

I implemented another github fork (jtara1/imgur-downloader) of mine to handle all Imgur images and galleries so Imgur works well.

I haven't tested tumblr or pixiv hosted media yet, but I'd like to add support for them too.

Second Question:

Is this unit test reliable for RedditImageGrab? I haven't played with it yet, and have been too lazy to port over to my fork.

/RedditImageGrab/redditdownload/tests/test-redditdownload.py

rachmadaniHaryono commented 8 years ago

Hi,

I haven't checked yet, but from what I remember it support gfycat, imgur, deviant art, and simple image link.

I have tumblr and (new) deviant parser, so I can add it here.

For pixiv I have a fork of some python pixiv project. If it is working, maybe I can add it here.

For the second question. You mean this https://github.com/rachmadaniHaryono/RedditImageGrab/blob/fix-continue/tests/test_redditdownload.py

Not so reliable. It need better implementation and better testing method. but it is better than nothing. And also because of this legacy code I'm still testing it py27.

I also want this program either support of to be ported into python3. If you want to be maintainer for python 3 port of this program, I can help you. If you want, I can raise new issue so @HoverHell can add link to your fork in Readme.

HoverHell commented 8 years ago

Actually, there is/was a scrap_wrongies.py script which, for all links of unknown type, downloads the page and all linked media and maybe recurses one link deep (like wget -l 2). Which helps with all the sites that aren't on the list of supported. Of course, the downside is that lots of extraneous images get downloaded, so post-filtering is required.

Anyway, going the way of youtube-dl or similar – a plugin-based media-by-link downloader system – would still be useful; but quite a lot of work.

HoverHell commented 8 years ago

Python 3 fork

Better yet, support both (either 2to3 or single-codebase).

Although I'd agree that requiring py3 isn't that bad an idea anymore.

jtara1 commented 8 years ago

About a year ago I made a script to read the amount of bytes of each HTML tag's href link then download the one with the most bytes. I'll play with this script see if I can use it as a generic image downloader or last resort when checking Content-type of HTML page or checking domain name of URL.

I'll read through scrap_wrongies.py

I'm reading through the manual pages of wget, never used this utility until today.


For now I'll fix a few bugs on my fork of RedditImageGrab then look try out downloading from these sites. I'll take a look at the work of @rachmadaniHaryono for more help with downloading.

@HoverHell you're welcome to link my fork in the readme. When I feel my fork is in an acceptable state, I'll submit a pull request into a separate branch than your master branch.

Aside from being a Python 3 port, there's several significant changes in how it functions more info on that in my readme.

https://github.com/jtara1/RedditImageGrab

rachmadaniHaryono commented 8 years ago

Anyway, going the way of youtube-dl or similar – a plugin-based media-by-link downloader system – would still be useful; but quite a lot of work.

Other things we can do is to provide settings Json or yaml, where user can create a costum command for specific link, and it will run using subprocees.

We can also add other setting such as skip certain website, add input flag, so user can just run the program without writing the input flag everytime. Think of it like pytest ini file

Better yet, support both (either 2to3 or single-codebase).

Im starting setup tox to make it possible, but this program need tests first.

For now I'll fix a few bugs on my fork of RedditImageGrab then look try out downloading from these sites. I'll take a look at the work of @rachmadaniHaryono for more help with downloading

Just skip my costum-build branch. It is not maintained and mostly my testing ground so I can't guarantee anything on that.

rachmadaniHaryono commented 8 years ago

Actually, there is/was a scrap_wrongies.py script

I know this, but I don't know how to use it.