binux / pyspider

A Powerful Spider(Web Crawler) System in Python.
http://docs.pyspider.org/
Apache License 2.0
16.48k stars 3.69k forks source link

add empty-valued header #961

Open curme opened 3 years ago

curme commented 3 years ago

Expected behavior

I want to add a header with empty value, for example, my request requires a header 'Authorization: ' which is a 'Authorization' header valued as ''.

url = 'xxx.xxxx.com' headers = {'Content-Type': 'application/json', 'Authorization': ''} self.crawl(url, callback=self.xxx_handle, method='POST', headers=headers)

Actual behavior

However, actually, the header 'Authorization' seems to be kicked out because it's empty.

Maybe, it might cause by lib 'libcurl', which I found linux command 'curl' has the same behavior: curl --request GET 'https://github.com/binux/pyspider' --header 'Accept: */*' --header 'Authorization: ' -v

In the curl printed log, we could find that 'Authorization' header was removed: > GET /binux/pyspider HTTP/1.1 > Host: github.com > User-Agent: curl/7.61.0 > Accept: */* >

I want to add empty-valued header into request. Any suggestions? Thanks a lot!

curme commented 3 years ago

I found that curl has already supported sending empty header by replacing the colon with a semicolon. Exactly in the case I provided above, we could do in this way: curl --request GET 'https://github.com/binux/pyspider' --header 'Accept: */*' --header 'Authorization;' -v

log: > GET /binux/pyspider HTTP/1.1 > Host: github.com > User-Agent: curl/7.61.0 > Accept: */* > Authorization: >

Which means 'libcurl' enables users to send empty header in request (BTW I suppose pyspider also rely on 'libcurl'). But I still haven found how to do in pyspider. Need your help