ShilongLee / Crawler

抖音爬虫(a_bogus最新版)、快手、哔哩哔哩、小红书、淘宝、京东、微博等平台爬虫开源api接口服务器。docker一键快速部署。
Other
446 stars 157 forks source link

获取抖音作品失败了,且无法获取到快手直播源 #66

Open vnxfsc opened 2 hours ago

vnxfsc commented 2 hours ago

成功获取到 /douyin/user 返回的json后 aweme_list 值是空的

vnxfsc commented 2 hours ago

import requests from bs4 import BeautifulSoup import re

HEADERS = { 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36', 'cookie': 'did=web_61439c813150c63c93461d81e3954b13; didv=1724721850178; clientid=3; did=web_61439c813150c63c93461d81e3954b13; client_key=65890b29; kpn=GAME_ZONE; kuaishou_live.live.bfb1s=ac5f27b3b62895859c4c1622f49856a4' } URL_TEMPLATE = "https://live.kuaishou.com/u/{}"

def fetch_page_content(url, headers): response = requests.get(url, headers=headers) response.raise_for_status() return response.text

def extract_initial_state(script_content): match = re.search(r'window.__INITIAL_STATE__\s=\s({.*?});', script_content) return match.group(1).replace('\u002F', '/') if match else None

def get_script_content(soup): script_tag = soup.find('script', string=re.compile(r'window.__INITIAL_STATE__')) return script_tag.string if script_tag else None

def extract_matches(initial_state): patterns = { "playUrls": r'"id":0,"url":"(.?)"', "author": r'"id":"[a-zA-Z0-9]+","name":"(.?)"', "id": r'"id":"([a-zA-Z0-9]+)"' } return {key: re.search(pattern, initial_state, re.DOTALL).group(1) if re.search(pattern, initial_state, re.DOTALL) else None for key, pattern in patterns.items()}

def fetch_kuaishou_live_info(room_id): url = URL_TEMPLATE.format(room_id) page_content = fetch_page_content(url, HEADERS) soup = BeautifulSoup(page_content, 'html.parser') script_content = get_script_content(soup) if not script_content: return None, None, None initial_state = extract_initial_state(script_content) if initial_state: matches = extract_matches(initial_state) play_url = matches.get("playUrls") name = matches.get("author") user_id = matches.get("id") return play_url, name, user_id return None, None, None

这是我比较笨的方法获取快手直播源的写法