unichaos77 commented 1 month ago

使用ak.stock_research_report_em接口，查询个股研报，代码为：

stock_research_report_em_df = ak.stock_research_report_em(symbol="000001") print(stock_research_report_em_df)

报错： ProxyError: HTTPSConnectionPool(host='reportapi.eastmoney.com', port=443): Max retries exceeded with url: /report/list?industryCode=%2A&pageSize=5000&industry=%2A&rating=%2A&ratingChange=%2A&beginTime=2000-01-01&endTime=2025-01-01&pageNo=1&fields=&qType=0&orgCode=&code=000001&rcode=&p=1&pageNum=1&pageNumber=1&_=1692533168153 (Caused by ProxyError('Unable to connect to proxy', FileNotFoundError(2, 'No such file or directory')))

我直接试了一下目标地址：https://data.eastmoney.com/report/stock.jshtml，发现request都不能返回正确结果。

我改用selenium，打开 https://data.eastmoney.com/report/，成功获得了研报的链接地址，代码为：

将获取研究报告的功能封装成函数，参数为代码，开始日期，结束日期，返回一个list，包含日期和链接

def get_pdf_links(stock_code, start_date, end_date): url = f"https://data.eastmoney.com/report/{stock_code}.html"

设置Selenium使用的ChromeDriver路径

service = Service(executable_path='./chromedriver-win64/chromedriver.exe')
# 创建一个Chrome选项实例
chrome_options = webdriver.ChromeOptions()
# 初始化WebDriver
driver = webdriver.Chrome(service=service, options=chrome_options)
# 打开目标页面
driver.get(url)
# 解析HTML
soup = BeautifulSoup(driver.page_source, 'html.parser')
# 用于存储结果的列表
results = []
# 找到所有的报告行
report_rows = soup.find_all('tr')
for row in report_rows:
    # 找到报告链接
    report_link = row.find('a', href=lambda href: href and href.startswith('/report/info/'))
    if report_link:
        # 构建完整的URL
        full_url = f"https://data.eastmoney.com{report_link['href']}"

        # 获取日期
        date_cell = row.find('td', string=lambda text: text and len(text) == 10 and text[4] == '-' and text[7] == '-')
        if date_cell:
            date = date_cell.text
            results.append((full_url, date))
# 找到下一页按钮就点击，如果没有就跳过    
try:
    next_page = driver.find_element(By.XPATH, "//a[text()='下一页']")
    next_page.click()
    time.sleep(3)
    # 解析HTML
    soup = BeautifulSoup(driver.page_source, 'html.parser')
    # 找到所有的报告行
    report_rows = soup.find_all('tr')
    for row in report_rows:
        # 找到报告链接
        report_link = row.find('a', href=lambda href: href and href.startswith('/report/info/'))
        if report_link:
            # 构建完整的URL
            full_url = f"https://data.eastmoney.com{report_link['href']}"

            # 获取日期
            date_cell = row.find('td', string=lambda text: text and len(text) == 10 and text[4] == '-' and text[7] == '-')
            if date_cell:
                date = date_cell.text
                results.append((full_url, date))        
except:
    #跳过
    pass

# 用于存储结果的列表
filtered_results = []
for result in results:
    url, date = result
    if date >= start_date and date <= end_date:
        filtered_results.append(result)

driver.quit()
return filtered_results

albertandking commented 1 month ago

使用ak.stock_research_report_em接口，查询个股研报，代码为：

stock_research_report_em_df = ak.stock_research_report_em(symbol="000001") print(stock_research_report_em_df)

报错： ProxyError: HTTPSConnectionPool(host='reportapi.eastmoney.com', port=443): Max retries exceeded with url: /report/list?industryCode=%2A&pageSize=5000&industry=%2A&rating=%2A&ratingChange=%2A&beginTime=2000-01-01&endTime=2025-01-01&pageNo=1&fields=&qType=0&orgCode=&code=000001&rcode=&p=1&pageNum=1&pageNumber=1&_=1692533168153 (Caused by ProxyError('Unable to connect to proxy', FileNotFoundError(2, 'No such file or directory')))

我直接试了一下目标地址：https://data.eastmoney.com/report/stock.jshtml，发现request都不能返回正确结果。

我改用selenium，打开 https://data.eastmoney.com/report/，成功获得了研报的链接地址，代码为：

将获取研究报告的功能封装成函数，参数为代码，开始日期，结束日期，返回一个list，包含日期和链接 def get_pdf_links(stock_code, start_date, end_date): url = f"https://data.eastmoney.com/report/{stock_code}.html" # 设置Selenium使用的ChromeDriver路径 service = Service(executable_path='./chromedriver-win64/chromedriver.exe') # 创建一个Chrome选项实例 chrome_options = webdriver.ChromeOptions() # 初始化WebDriver driver = webdriver.Chrome(service=service, options=chrome_options) # 打开目标页面 driver.get(url) # 解析HTML soup = BeautifulSoup(driver.page_source, 'html.parser') # 用于存储结果的列表 results = [] # 找到所有的报告行 report_rows = soup.find_all('tr') for row in report_rows: # 找到报告链接 report_link = row.find('a', href=lambda href: href and href.startswith('/report/info/')) if report_link: # 构建完整的URL full_url = f"https://data.eastmoney.com{report_link['href']}"
        # 获取日期
        date_cell = row.find('td', string=lambda text: text and len(text) == 10 and text[4] == '-' and text[7] == '-')
        if date_cell:
            date = date_cell.text
            results.append((full_url, date))
# 找到下一页按钮就点击，如果没有就跳过    
try:
    next_page = driver.find_element(By.XPATH, "//a[text()='下一页']")
    next_page.click()
    time.sleep(3)
    # 解析HTML
    soup = BeautifulSoup(driver.page_source, 'html.parser')
    # 找到所有的报告行
    report_rows = soup.find_all('tr')
    for row in report_rows:
        # 找到报告链接
        report_link = row.find('a', href=lambda href: href and href.startswith('/report/info/'))
        if report_link:
            # 构建完整的URL
            full_url = f"https://data.eastmoney.com{report_link['href']}"

            # 获取日期
            date_cell = row.find('td', string=lambda text: text and len(text) == 10 and text[4] == '-' and text[7] == '-')
            if date_cell:
                date = date_cell.text
                results.append((full_url, date))        
except:
    #跳过
    pass

# 用于存储结果的列表
filtered_results = []
for result in results:
    url, date = result
    if date >= start_date and date <= end_date:
        filtered_results.append(result)

driver.quit()
return filtered_results

接口测试正常

unichaos77 commented 1 month ago

谢谢反馈！今天测试了接口，确实正常。不过返回数据中只有研报的标题等简单信息，能否增加研报的网页地址？

akfamily / akshare

ak.stock_research_report_em #5192

将获取研究报告的功能封装成函数，参数为代码，开始日期，结束日期，返回一个list，包含日期和链接

设置Selenium使用的ChromeDriver路径