findix / ArtStationDownloader

ArtStation Downloader is a lightweight tool to help you download images and videos from https://www.artstation.com/
MIT License
245 stars 29 forks source link

You are blocked by artstation #8

Closed eposicee closed 2 years ago

eposicee commented 5 years ago

image

人在牆外,AS能正常訪問

AVRILLAVlGNE commented 5 years ago

[Error] [403 Forbidden] You are blocked by artstation我也是,这是bug吗

DarkRoastOtter commented 5 years ago

Unfortunately, the creator is aware of the issue but it doesn't appear there's a current workaround. It'd make sense if a VPN worked since they likely block via IP, but it seems when the application was failing the checks, those users were blocked from accessing the site in the way the application needs to.

findix commented 5 years ago

Please see https://github.com/findix/ArtStationDownloader/issues/6#issuecomment-485434161 最近 ArtStation 开始加入了反爬虫机制,具体来说就是如果你访问过多,会被重定向到验证码界面。我找到一个Node库(cloudscraper)好像可以跳过这个验证码,但是Python这边我还没有找到一个有效的办法解决……有空的话我可以仔细研究一下那个Node库是怎么做的……

kent-lee commented 5 years ago

there are two solutions I found for the bot detection: (1) use Selenium to get the projects.json page. (2) forget about the JSON page and go to https://{artist_id}.artstation.com to parse plain HTML to get what you need. Shameless plug - for more details on the solutions, you can have a look at my GitHub, under section Challenges point 1.

udterry commented 5 years ago

楼上老哥这个可以下。还有点问题,下载的图不全。

udterry commented 5 years ago

there are two solutions I found for the bot detection: (1) use Selenium to get the projects.json page. (2) forget about the JSON page and go to https://{artist_id}.artstation.com to parse plain HTML to get what you need. Shameless plug - for more details on the solutions, you can have a look at my GitHub, under section Challenges point 1.

I found a problem this artists "Bageumi ."It seems that you can't create a directory

return {*[f.partition(separator)[0] for f in os.listdir(dir_path)]}

FileNotFoundError: [WinError 3] 系统找不到指定的路径。: 'E:\artstation\Bageumi .'

findix commented 5 years ago

Hi @Kent-Lee Thank you so much for your valuable hint. I am also very surprised by why only browser can pass.

ChromeDriver is obviously too heavy. Parse HTML directly seems like a solution but I'm worry about the CAPTCHA can't be walk around neither.

kent-lee commented 5 years ago

@udterry 感謝你的回饋,我已解決提出的兩個問題。

系統找不到指定的路徑

這是因為 Windows 的檔案總管會自動消除文件夾末尾的句點符號,所以我的預想路徑跟實際路徑有了偏差而導致的。之前寫 DeviantArt 的爬蟲時沒遇過這個問題,因此沒注意到。解決方法:直接用畫家的 ID 來命名文件夾,避免類似問題發生。

下載的圖不全

這是因為有些檔案有相同的名字而導致先前下載的文件被覆蓋掉。解決方法:每個圖都有一個獨特的 ID (e.g. artwork["assets"][0]["id"]),所以只要把每張圖片名稱後面加上這個 ID 就行了。

kent-lee commented 5 years ago

@findix

如果不定時更換 proxies / IP address,然後再限制每秒 requests 的數量,被抓到的機率應該不高。

TA2WK commented 5 years ago

Kent-Lee, Your script works like a charm in MacOS. Well done bro...

DD-minelong commented 4 years ago

测试了下 下载视频还行没办法下载,不过图片可以正常下载

findix commented 2 years ago

It's fixed. try latest https://github.com/findix/ArtStationDownloader/releases