Closed riptl closed 5 years ago
I made a gist that grabs directly from the CDN https://gist.github.com/nektro/3a4c25eb66cb0abf24b84c0239acddbb
Example: https://alpha.wallhaven.cc/wallpaper/193 https://wallpapers.wallhaven.cc/wallpapers/full/wallhaven-193.jpg
https://gist.github.com/nektro/3a4c25eb66cb0abf24b84c0239acddbb
That's already what we do, the issue isn't that, look at the issue and the code.
@nektro Sorry forgot to mention. This is strictly about the HTML page containing the metadata! Thanks for looking into this
Cookies from Chrome work, the --cookie
flag is a workaround: https://github.com/CorentinB/WallhavenScraper/commit/e2215b6f2b9a713f5949a99f0e213bacf29ea0c8
Workaround working ...
NSFW image pages require a login to view on WallHaven. We tried implementing a login using http.CookieJar as well as serializing cookies by hand to no avail.
This is an example of a protected URL. https://alpha.wallhaven.cc/wallpaper/193
Note that the actual image file is available! We need the metadata/HTML Page
All that's needed to view the image is the correct cookie header, consisting of multiple cookies including a session token. The cookies are set in each server response.
The
-u
and-p
flags are used for authentication.The master branch uses http.CookieJar for cookies. The cookie-test branch reads the cookies externally and serializes them back together before requests.
If you can get login + viewing NSFW to work on either of the branches, let us know asap and please file a Pull Request or contact https://the-eye.eu
This is an urgent issue, as WallHaven will completely switch their site structure making crawling much harder in under 5 hours.
Any pull requests that bring us closer to fixing this are highly welcome!