Closed unichaos77 closed 1 month ago
使用ak.stock_research_report_em接口,查询个股研报,代码为:
stock_research_report_em_df = ak.stock_research_report_em(symbol="000001") print(stock_research_report_em_df)
报错: ProxyError: HTTPSConnectionPool(host='reportapi.eastmoney.com', port=443): Max retries exceeded with url: /report/list?industryCode=%2A&pageSize=5000&industry=%2A&rating=%2A&ratingChange=%2A&beginTime=2000-01-01&endTime=2025-01-01&pageNo=1&fields=&qType=0&orgCode=&code=000001&rcode=&p=1&pageNum=1&pageNumber=1&_=1692533168153 (Caused by ProxyError('Unable to connect to proxy', FileNotFoundError(2, 'No such file or directory')))
我直接试了一下目标地址:https://data.eastmoney.com/report/stock.jshtml,发现request都不能返回正确结果。
我改用selenium,打开 https://data.eastmoney.com/report/,成功获得了研报的链接地址,代码为:
将获取研究报告的功能封装成函数,参数为代码,开始日期,结束日期,返回一个list,包含日期和链接 def get_pdf_links(stock_code, start_date, end_date): url = f"https://data.eastmoney.com/report/{stock_code}.html" # 设置Selenium使用的ChromeDriver路径 service = Service(executable_path='./chromedriver-win64/chromedriver.exe') # 创建一个Chrome选项实例 chrome_options = webdriver.ChromeOptions() # 初始化WebDriver driver = webdriver.Chrome(service=service, options=chrome_options) # 打开目标页面 driver.get(url) # 解析HTML soup = BeautifulSoup(driver.page_source, 'html.parser') # 用于存储结果的列表 results = [] # 找到所有的报告行 report_rows = soup.find_all('tr') for row in report_rows: # 找到报告链接 report_link = row.find('a', href=lambda href: href and href.startswith('/report/info/')) if report_link: # 构建完整的URL full_url = f"https://data.eastmoney.com{report_link['href']}"
# 获取日期 date_cell = row.find('td', string=lambda text: text and len(text) == 10 and text[4] == '-' and text[7] == '-') if date_cell: date = date_cell.text results.append((full_url, date)) # 找到下一页按钮就点击,如果没有就跳过 try: next_page = driver.find_element(By.XPATH, "//a[text()='下一页']") next_page.click() time.sleep(3) # 解析HTML soup = BeautifulSoup(driver.page_source, 'html.parser') # 找到所有的报告行 report_rows = soup.find_all('tr') for row in report_rows: # 找到报告链接 report_link = row.find('a', href=lambda href: href and href.startswith('/report/info/')) if report_link: # 构建完整的URL full_url = f"https://data.eastmoney.com{report_link['href']}" # 获取日期 date_cell = row.find('td', string=lambda text: text and len(text) == 10 and text[4] == '-' and text[7] == '-') if date_cell: date = date_cell.text results.append((full_url, date)) except: #跳过 pass # 用于存储结果的列表 filtered_results = [] for result in results: url, date = result if date >= start_date and date <= end_date: filtered_results.append(result) driver.quit() return filtered_results
接口测试正常
谢谢反馈!今天测试了接口,确实正常。不过返回数据中只有研报的标题等简单信息,能否增加研报的网页地址?
使用ak.stock_research_report_em接口,查询个股研报,代码为:
stock_research_report_em_df = ak.stock_research_report_em(symbol="000001") print(stock_research_report_em_df)
报错: ProxyError: HTTPSConnectionPool(host='reportapi.eastmoney.com', port=443): Max retries exceeded with url: /report/list?industryCode=%2A&pageSize=5000&industry=%2A&rating=%2A&ratingChange=%2A&beginTime=2000-01-01&endTime=2025-01-01&pageNo=1&fields=&qType=0&orgCode=&code=000001&rcode=&p=1&pageNum=1&pageNumber=1&_=1692533168153 (Caused by ProxyError('Unable to connect to proxy', FileNotFoundError(2, 'No such file or directory')))
我直接试了一下目标地址:https://data.eastmoney.com/report/stock.jshtml,发现request都不能返回正确结果。
我改用selenium,打开 https://data.eastmoney.com/report/,成功获得了研报的链接地址,代码为:
将获取研究报告的功能封装成函数,参数为代码,开始日期,结束日期,返回一个list,包含日期和链接
def get_pdf_links(stock_code, start_date, end_date): url = f"https://data.eastmoney.com/report/{stock_code}.html"
设置Selenium使用的ChromeDriver路径