Open LichMscy opened 7 years ago
其实可以不用进行验证码操作,受作者启发,可以先登录weibo.com的无验证码入口(微博账号安全里设为常登陆地点可以免验证码),然后直接在phontomjs模拟打开weibo.cn,weibo.cn会是登录状态,这时候获取cookies便可。
weibo.com
phontomjs
weibo.cn
由于我自己实现了,代码如下,仅供参考:
def init_phantomjs_driver(): headers = { 'Cookie': 'YF-Ugrow-G0=b02489d329584fca03ad6347fc915997; SUB=_2AkMvgPj2dcPxrAFYnPgWyGvkZYpH-jycVZEAAn7uJhMyOhgv7nBSqSVOKynW2PbhU4768kfRGZgNPwXeRA..; SUBP=0033WrSXqPxfM72wWs9jqgMF55529P9D9WWEFXHsNpvgJdQjr1GM.e765JpVF020SKM7e0571hMc', # 未登录时weibo.com的cookie } for key, value in headers.items(): webdriver.DesiredCapabilities.PHANTOMJS['phantomjs.page.customHeaders.{}'.format(key)] = value useragent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.110 Safari/537.36' webdriver.DesiredCapabilities.PHANTOMJS['phantomjs.page.settings.userAgent'] = useragent # local path refer phantomjs driver = webdriver.PhantomJS(executable_path='xxxxxxxphantomjs路径xxxxxxx') driver.set_window_size(1366, 768) return driver
browser = weibo_auto_handle.init_phantomjs_driver() browser.get("http://weibo.com") time.sleep(3) failure = 0 while "微博-随时随地发现新鲜事" == browser.title and failure < 5: failure += 1 username = browser.find_element_by_name("username") pwd = browser.find_element_by_name("password") login_submit = browser.find_element_by_class_name('W_btn_a') username.clear() username.send_keys(account['usn']) pwd.clear() pwd.send_keys(account['pwd']) login_submit.click() time.sleep(5) # if browser.find_element_by_class_name('verify').is_displayed(): # logging.error("Verify code is needed! (Account: %s)" % account) if "我的首页 微博-随时随地发现新鲜事" in browser.title: browser.get('http://weibo.cn/') cookie = dict() if "我的首页" in browser.title: for elem in browser.get_cookies(): cookie[elem["name"]] = elem["value"] # p2 = persist_iics.Persist() # p2.save_account_cookies(accounts[0][0], cookie, datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")) logging.error('Account cookies updated! (Account_id: %s)' % account['usn']) return cookie
嗯,想法不错,少量作业的情况可以用这个。 但是如果抓取量大的话登录的账号比较多,不可能人工去设置,另外微博对IP有限制,爬得快的要加代理,也不适用。
其实可以不用进行验证码操作,受作者启发,可以先登录
weibo.com
的无验证码入口(微博账号安全里设为常登陆地点可以免验证码),然后直接在phontomjs
模拟打开weibo.cn
,weibo.cn
会是登录状态,这时候获取cookies便可。由于我自己实现了,代码如下,仅供参考: