bebound / pixivd

Pixiv Downloader - Batch download pictures from Pixiv
MIT License
138 stars 18 forks source link

the net has changed #4

Closed Cyux07 closed 8 years ago

Cyux07 commented 8 years ago

now click 'login' requests this url 'https://accounts.pixiv.net/api/login?lang=zh' and a 400 error would occur.Could you upgrade a new version or just give me some guides please? Thanks!

bebound commented 8 years ago

Hmm, I can't reproduce this error, what did you do when meet this error?

Cyux07 commented 8 years ago

I use Chrome's f12 console to found out this url. Then I pack the headers include 'Host, Refer, Origin, User-Agent' and data 'pixiv_id, password, post-key, source', thenrequests.session()and post it. To say the least, I even cant open the url by myself (click it).isnt that weird?

bebound commented 8 years ago

So you're just asking a general question, there is no problem with this project, right?

This code works on my computer, hope this helps.

s = requests.Session()
data = {'pixiv_id': 'xx',
        'password': 'xx',
        'captcha': '',
        'g_recaptcha_response': '',
        'post_key': 'xx',
        'source': 'pc'}

s.post('https://accounts.pixiv.net/api/login?lang=zh', data=data)
r=s.get('https://www.pixiv.net')
print(r.text)
Cyux07 commented 8 years ago

eh,yes,you can visit pixiv whatever you are login or not, just with a lot of restrictions if you have not login(success). this statement s.post('https://accounts.pixiv.net/api/login?lang=zh', data=data) would give a return value response 400.

bebound commented 8 years ago

It should return 200, I've tested it on my computer.

Maybe your post_key is incorrect? You need to extract it from the login page's HTML source code.

Cyux07 commented 8 years ago
self.login_header = {
            'Host':'accounts.pixiv.net',
            'Origin':'https://accounts.pixiv.net',
            'Referer':'https://accounts.pixiv.net/login?lang=zh&source=pc&view_type=page&ref=wwwtop_accounts_index',
            'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36'
            ,'X-Requested-With':'XMLHttpRequest'
            ,'Upgrade-Insecure-Requests':'1'
            }
s = requests.session()
 login_data = {"pixiv_id" : pixivId,
            "password" : password,
            'captcha': '',
            'g_recaptcha_response': '',
            "post_key" : postKey,
            "source":source}
        r = s.post(self.login_url, data = login_data, headers = self.login_header)
        print(r)

console show <Response [400]>

bebound commented 8 years ago

Your code also returns <Response [200]>

If the requests session is already logged, and you post the login information again, you'll get 400, I guess this is the cause of the problem.

Cyux07 commented 8 years ago

but...how? the session would be a new instance at every time i restart the program.

bebound commented 8 years ago

I can't understand what you say, you can speak Chinese. Both of the code works properly, I don't know how you get the 400 error.

Cyux07 commented 8 years ago

好吧...我是讲,每次运行这个程序都会创建一个新的session实例,不可能已在登录状态啊。更何况我之前也没登录成功过。 你是否有尝试检查主页的某些特定元素来确定是否‘登入态’? (例如:www.pixiv.net/search.php?word=overwatch ,登入和未登录顶端的话不一样(meta description),未登入态看不到收藏数且只有10页)

bebound commented 8 years ago

你运行过我给的示例代码吗? 它的输出就是已登陆状态的Pixiv主页HTML源代码。

s.post第一次运行会返回200 再Post一次会返回400 这也是我能想到的唯一一个产生400的原因了。。。

Cyux07 commented 8 years ago

试了,显示的是注册页(就同未登录时访问主站地址一样)。 可以看一下你的完整代码吗?

bebound commented 8 years ago
import re
import requests

s = requests.Session()

r = s.get('https://accounts.pixiv.net/login?lang=zh&source=pc&view_type=page&ref=wwwtop_accounts_index')
post_key = re.search(r'name="post_key" value="(\w+)"', r.text).group(1)

data = {'pixiv_id': 'xxx',
        'password': 'xxx',
        'captcha': '',
        'g_recaptcha_response': '',
        'post_key': post_key,
        'source': 'pc'}
s.post('https://accounts.pixiv.net/api/login?lang=zh', data=data)
r = s.get('https://www.pixiv.net')
print(r.text)
Cyux07 commented 8 years ago

post_key到底是干嘛的?我之前直接和cookies一样存了一个死的。肥肠感谢,这个获取key的思路。 然后我尝试去掉headers和cookies之后就可用了。(!!!) 为什么反而不要???一般网站不都是根据headers来防爬的吗?

bebound commented 8 years ago

Maybe your post_key is incorrect? You need to extract it from the login page's HTML source code.

post_key是用来防止CSRF的

有时间看看这个吧 https://github.com/FredWe/How-To-Ask-Questions-The-Smart-Way/blob/master/README-zh_CN.md

Cyux07 commented 8 years ago

受教了!看来cookies的概念还需再学习。 愿好运长伴你 : )