daniperaleda commented 11 months ago

Hi:

I have tried to use the code but don't know if it is still working for you.

I am facing problems to start using it.

The main error I get is "TypeError: WebDriver.init() got an unexpected keyword argument 'executable_path'"

It seems about the code but after google search I have not been able to fix it.

Thanks in advance

meraline commented 9 months ago

yes, it outputs an error for me too

(TorchEnv) PS C:\Users\Анатолий\Documents\GitHub> & C:/Users/Анатолий/source/repos/PyTorchtest/PyTorchtest/TorchEnv/Scripts/python.exe c:/Users/Анатолий/Documents/GitHub/scrapeOP/FinalS craper.py C:\Users\Анатолий\Documents\GitHub Data will be saved in the following directory: C:\Users\Анатолий\Documents\GitHub Please indicate the format of tournament (3 sets or 5 sets) :

Please indicate the surface :

We start to scrape the following tournament : charleston-challenger-men Traceback (most recent call last): File "c:\Users\Анатолий\Documents\GitHub\scrapeOP\FinalScraper.py", line 14, in scrape_oddsportal_current_season(sport = 'tennis', country = 'usa', league = 'charleston-challenger-men', season = '2023', max_page = 25) File "c:\Users\Анатолий\Documents\GitHub\scrapeOP\functions.py", line 1363, in scrape_oddsportal_current_season df = scrape_current_tournament_typeB(Surface = surface, bestof = bestof, tournament = league, \ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "c:\Users\Анатолий\Documents\GitHub\scrapeOP\functions.py", line 556, in scrape_current_tournament_typeB driver = webdriver.Chrome(executable_path = DRIVER_LOCATION)

draccc commented 9 months ago

I did edit this part of the code and then this error stoped.

def scrape_current_tournament_typeC(sport, tournament, country, SEASON, max_page = 25): global driver ############### NOW WE SEEK TO SCRAPE THE ODDS AND MATCH INFO################################ DATA_ALL = [] for page in range(1, max_page + 1): print('We start to scrape the page n°{}'.format(page)) try: driver.quit() # close all widows except: pass driver = webdriver.Chrome() data = scrape_page_typeC(page, sport, country, tournament, SEASON) DATA_ALL = DATA_ALL + [y for y in data if y != None] driver.close() data_df = pd.DataFrame(DATA_ALL) try: data_df.columns = ['TeamsRaw', 'Bookmaker', 'OddHome','OddDraw', 'OddAway', 'DateRaw' ,'ScoreRaw'] except: print('Function crashed, probable reason : no games scraped (empty season)') return(1) ##################### FINALLY WE CLEAN THE DATA AND SAVE IT ########################## '''Now we simply need to split team names, transform date, split score'''

(0) Filter out None rows

data_df = data_df[~data_df['Bookmaker'].isnull()].dropna().reset_index()
data_df["TO_KEEP"] = 1
for i in range(len(data_df["TO_KEEP"])):
    if len(re.split(':',data_df["ScoreRaw"][i]))<2 :
        data_df["TO_KEEP"].iloc[i] = 0
data_df = data_df[data_df["TO_KEEP"] == 1]
# (a) Split team names
data_df["Home_id"] = [re.split(' - ',y)[0] for y in data_df["TeamsRaw"]]
data_df["Away_id"] = [re.split(' - ',y)[1] for y in data_df["TeamsRaw"]]
# (b) Transform date
data_df["Date"] = [re.split(', ',y)[1] for y in data_df["DateRaw"]]
# (c) Split score
data_df["Score_home"] = [re.split(':',y)[0][-2:] for y in data_df["ScoreRaw"]]
data_df["Score_away"] = [re.split(':',y)[1][:2] for y in data_df["ScoreRaw"]]
# (e) Set season column
data_df["Season"] = SEASON
# Finally we save results
if not os.path.exists('./{}_FULL'.format(tournament)):
    os.makedirs('./{}_FULL'.format(tournament))
if not os.path.exists('./{}'.format(tournament)):
    os.makedirs('./{}'.format(tournament))
data_df.to_csv('./{}_FULL/{}_{}_FULL.csv'.format(tournament,tournament, SEASON), sep=';', encoding='utf-8', index=False)
data_df[['Home_id', 'Away_id', 'Bookmaker', 'OddHome','OddDraw', 'OddAway', 'Date', 'Score_home', 'Score_away','Season']].to_csv('./{}/{}_{}.csv'.\
    format(tournament,tournament, SEASON), sep=';', encoding='utf-8', index=False)
return(data_df)

I did also get an error at this part so i did just try to comment it out.

def reject_ads(switch_to_decimal = True):

# Reject ads
# ffi2('//*[@id="onetrust-reject-all-handler"]')
# if switch_to_decimal:
    # Change odds to decimal format
   # driver.find_element("xpath", '//*[@id="user-header-oddsformat-expander"]').click()
   # driver.find_element("xpath", '//*[@id="user-header-oddsformat"]/li[1]/a/span').click()

There is reject_ads in some of the def witch i also just did comment out.

Let me know if it helps you :)

Seb943 / scrapeOP

Not possible to use #12

(0) Filter out None rows

def reject_ads(switch_to_decimal = True):