我不管有没有使用代理，他都告诉我403请问这该怎么处理

hjllsd commented 1 year ago

我是扣了爬取代码下来,期待你的回复 import requests import base64 import time DEFAULT_HEADERS = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36", "authority": "soutubot.moe", "origin": "https://soutubot.moe", "referer": "https://soutubot.moe/", "x-requested-with": "XMLHttpRequest", "Cookie": "_ga=GA1.1.336138872.1686596474; cf_clearance=SbyyT8TSeUmgTprCJuwQnwwX.6vGV8ZBZoJmusXPCsQ-1686602349-0-160; _ga_JB369TC9SF=GS1.1.1686602155.2.1.1686602361.0.0.0", } search_bot = "https://soutubot.moe/api/search" if name == "main": proxies = { "http": "http://127.0.0.1:7890", "https": "http://127.0.0.1:7890" } Q = str(int(pow(time.time(), 2)) + int(pow(len(DEFAULT_HEADERS["User-Agent"]), 2))) encoded_data = str(base64.b64encode(Q.encode()).decode())[::-1].replace("=", "") fr = open("E:\code\npm_code\node1\test\a.jpg", "rb") DEFAULT_HEADERS.update({"x-api-key": encoded_data}) data = {"factor": 1.2} resp = requests.post(search_bot, headers=DEFAULT_HEADERS, files={"file": fr}, data=data) print(resp.status_code)

crosage commented 1 year ago

@hjllsd 那个站又加强防护了，现在必须得绕过cloudflare，等我有时间写个playwright绕过一下

crosage commented 1 year ago

或者拿他那个模型自己炼吧（https://github.com/lolishinshi/imsearch

hjllsd commented 1 year ago

或者拿他那个模型自己炼吧（https://github.com/lolishinshi/imsearch）\ 首先感谢你的回复,我自己尝试了绕过不过失败了,如果我成功了会提交合并 1:这个链接为空 2:如果你更新了可以关闭回答或者回答我再次感谢你抽空的回复

4o3F commented 1 year ago

CF的验证是基于cookie浏览器指纹等多方面因素，playwright启动的全新浏览器大概率无法通过检测，与其这样不如用日常使用的浏览器开启devtool协议如果可以的话请发送PR这样可以看下你的方案

crosage / nonebot-plugin-spiders

我不管有没有使用代理，他都告诉我403请问这该怎么处理 #1