Open TeresaYang00 opened 4 days ago
take 1 hr 今日確認爬蟲是否有正常運行 目前版本為2023年上市公司路徑 並調整上櫃公司路徑爬蟲
目前進度 2023年財報 6/25 爬蟲 至20:16 即停止 6/26 早上排除問題 股票代號2534這公司直接被跳過 後續待手動下載補上 . 6/26 18:41 爬到 股票代號 8438
調整成上櫃公司2023年的程式碼
from selenium import webdriver from webdriver_manager.chrome import ChromeDriverManager from selenium.webdriver.chrome.service import Service from selenium.webdriver.chrome.options import Options import time from selenium.webdriver.common.by import By from selenium.common.exceptions import NoSuchElementException import pandas as pd
download_dir = "/Users/teresayang/Desktop/上市上櫃公司財報"
chrome_options = Options() chrome_options.add_experimental_option( 'prefs', {'download.prompt_for_download':False, 'plugins.always_open_pdf_externally':True, 'download.default_directory': download_dir})
driver = webdriver.Chrome(options=chrome_options)
file_path = '/Users/teresayang/Desktop/TW_company_code.csv' df = pd.read_csv(file_path, encoding='unicode_escape')
company_code = df.iloc[:, 0]
base_url = 'https://mops.twse.com.tw/mops/web/t51sb01'
for code in company_code: driver.get(f'{base_url}?co_id={code}&year=112&mtype=A')
xpath_list = driver.find_elements(By.XPATH, '/html/body/center/form/table[2]/tbody/tr/td[8]/a')
# 儲存原始視窗的handle
original_window = driver.current_window_handle
# 逐一點擊每個財務報告連結
for element in xpath_list:
element.click()
time.sleep(3)
# 切換到新開的視窗
driver.switch_to.window(driver.window_handles[-1])
try:
# 在新視窗中進行操作
true_link = driver.find_element(By.XPATH, '/html/body/center/a')
true_link.click()
time.sleep(25)
print(f'已經下載{code}')
except Exception as e:
print(f"發生錯誤: {e}")
# 關閉新視窗
driver.close()
# 切換回原始視窗
driver.switch_to.window(original_window)
driver.quit() print("完成")
MyGroup Review _Python -透過-PDF檔-爬蟲,下載MOPS所有公司「財務報告書(電子書)」
版本: 2023年
from selenium import webdriver from webdriver_manager.chrome import ChromeDriverManager from selenium.webdriver.chrome.service import Service from selenium.webdriver.chrome.options import Options import time from selenium.webdriver.common.by import By from selenium.common.exceptions import NoSuchElementException import pandas as pd
download_dir = "/Users/teresayang/Desktop/上市上櫃公司財報"
chrome_options = Options() chrome_options.add_experimental_option( 'prefs', {'download.prompt_for_download':False, 'plugins.always_open_pdf_externally':True, 'download.default_directory': download_dir})
driver = webdriver.Chrome(options=chrome_options)
/html/body/center/form/table[2]/tbody/tr[2]/td[8]/a 第一個財報xpath
/html/body/center/form/table[2]/tbody/tr[11]/td[8]/a 最後一個財報xpath
/html/body/center/form/table[2]/tbody/tr/td[8]/a 意思是所有財表的xpath
讀桌面的報表
file_path = '/Users/teresayang/Desktop/TW_company_code.csv' df = pd.read_csv(file_path,encoding='unicode_escape')
只需要第一欄
company_code= df.iloc[:, 0]
year=112 2023年
for code in company_code: driver.get(f'https://doc.twse.com.tw/server-java/t57sb01?step=1&co_id={code}&year=112&mtype=A')
Clean up
driver.quit() print("done")