Python 3 compatibility updates for 2019 Season

ghost commented 5 years ago

Y'all are some lazy mfs....

C:\Temp\Python.virtualenvs\actuary\Scripts\python.exe C:/Users/actuary/gitrepos/pyDataModel/nflgame-update-players.py Loading games for PRE 2019 week 2 Downloading team rosters... 32/32 complete. (100.00%) Done! Fetching GSIS identifiers for players not in nflgame... 634/1799 complete. (35.24%)

mickeelm commented 5 years ago

This project is unmaintained, please see link in readme for the active fork.

derek-adair commented 5 years ago

You are more than welcome to try out the python 3 branch of the maintained version.

pip install nflgame-redux==2.0.1a1

derek-adair commented 5 years ago

Also hilarious to call @BurntSushi lazy when you've CLEARLY not taken a look at the README here.

ghost commented 5 years ago

@derek-adair lol I’m just playin burntsushi’s the goat for this, let me know if you’d like help with nfldb to py3

In the meantime, here’s a snippet to scrape all historical nfl spreads (lines and O/U’s) for the gamblin folk:

import re import pandas as pd import requests from bs4 import BeautifulSoup pd.set_option('display.expand_frame_repr', False)

seasons = [ '2018-2019', '2017-2018', '2016-2017', '2015-2016', '2014-2015', '2013-2014', '2012-2013', '2011-2012', '2010-2011', '2009-2010', '2008-2009', '2007-2008', '2006-2007', '2005-2006', '2004-2005' ]

team_dict = { 'Arizona': 'Arizona Cardinals', 'Atlanta': 'Atlanta Falcons', 'Baltimore': 'Baltimore Ravens', 'Buffalo': 'Buffalo Bills', 'Carolina': 'Carolina Panthers', 'Chicago': 'Chicago Bears', 'Cincinnati': 'Cincinnati Bengals', 'Cleveland': 'Cleveland Browns', 'Dallas': 'Dallas Cowboys', 'Denver': 'Denver Broncos', 'Detroit': 'Detroit Lions', 'Green Bay': 'Green Bay Packers', 'Houston': 'Houston Texans', 'Indianapolis': 'Indianapolis Colts', 'Jacksonville': 'Jacksonville Jaguars', 'Kansas City': 'Kansas City Chiefs', 'L.A. Chargers': 'Los Angeles Chargers', 'L.A. Rams': 'Los Angeles Rams', 'Miami': 'Miami Dolphins', 'Minnesota': 'Minnesota Vikings', 'N.Y. Giants': 'New York Giants', 'N.Y. Jets': 'New York Jets', 'New England': 'New England Patriots', 'New Orleans': 'New Orleans Saints', 'Oakland': 'Oakland Raiders', 'Philadelphia': 'Philadelphia Eagles', 'Pittsburgh': 'Pittsburgh Steelers', 'San Francisco': 'San Francisco 49ers', 'Seattle': 'Seattle Seahawks', 'Tampa Bay': 'Tampa Bay Buccaneers', 'Tennessee': 'Tennessee Titans', 'Washington': 'Washington Redskins'}

def parse_date(text): pattern = r"\b\d+/\d+/\d+" return re.findall(pattern, text)[0]

def parse_home_away(text): if text[0] == '@': return 'Away' else: return 'Home'

def parse_opponent(text): return re.sub('@', '', text).lstrip()

def parse_opponent(text): return team_dict[re.sub('@', '', text).lstrip()]

def CoversHistoricalLineScraper(): nfl_league_data = list() for season in seasons: nfl_season_data = list() for team_index in range(1, 33): url = 'http://www.covers.com/pageLoader/pageLoader.aspx?page=/data' \ '/nfl/teams/pastresults/' + season + '/team' + str( team_index) + '.html' r = requests.get(url) soup = BeautifulSoup(r.text, 'lxml') name = soup.find_all('div', { 'class': 'teamname'}) team_name = " ".join(re.sub('\s+', ' ', name[0].text).split()) print(season, "|", team_name) tables = soup.find_all('table') table_regular_season = tables[1] if len(tables) == 3 else tables[0] table_playoffs = tables[2] if len(tables) == 3 else tables[1]

        games = table_regular_season.findChildren('tr')
        data_headers = ['Date',
                        'Opponent',
                        'Score',
                        'Week',
                        'Line',
                        'OverUnder']
        season_data = list()
        for i in range(1, len(games)):
            data_values = list()
            for game_data in games[i].find_all('td'):
                value = " ".join(
                    re.sub('\s+', ' ', game_data.text).split()).lstrip(" ")
                data_values.append(value)
            season_data.append(data_values)

        team_data = pd.DataFrame(season_data, columns=data_headers)
        team_data = team_data[team_data['Date'] != 'BYE']

        team_data['Date'] = team_data['Date'].apply(lambda x: parse_date(x))
        team_data[['OverUnder', 'PointTotal']] = team_data[
            'OverUnder'].str.split(expand=True)
        team_data[['LineResult', 'Line']] = team_data['Line'].str.split(
            expand=True)
        team_data['Line'] = team_data['Line'].str.replace('PK', '0')
        team_data['Week'] = team_data['Week'].str.replace(r'Week\s', '').astype(
            'int')
        team_data['HomeAway'] = team_data['Opponent'].apply(
            lambda x: parse_home_away(x))
        team_data['Opponent'] = team_data['Opponent'].apply(
            lambda x: parse_opponent(x))
        team_data['Season'] = season.split('-')[0]
        team_data['TeamName'] = team_name

        column_list = ['Date',
                       'Season',
                       'Week',
                       'TeamName',
                       'Opponent',
                       'HomeAway',
                       'OverUnder',
                       'PointTotal',
                       'Line',
                       'LineResult']

        team_data = team_data[column_list]
        team_data = team_data.sort_values(by='Week').reset_index(drop=True)
        nfl_season_data.append(team_data)
        nfl_season_data
    nfl_season_data = pd.concat(nfl_season_data)
    nfl_league_data.append(nfl_season_data)
nfl_league_data = pd.concat(nfl_league_data)
nfl_league_data.to_csv(r'C:\Users\bd391nr\gitrepos\PyModelDesign\data'
                       r'\spreads\nfl_spreads.csv', header=True,
                       index=False)
return nfl_league_data

nfl_spreads = CoversHistoricalLineScraper()

derek-adair commented 5 years ago

Was hoping that's the case... there are some people who would have meant that :)

derek-adair commented 5 years ago

nfldb -> py3 is on the radar. However, i'm currently considering swapping out the entire thing w/ something more "modern". The idea would be to remove the json.tar.gz storage in nflgame, and replace it with some sort of document storage database. I think it'd be REALLY FUCKING COOL to switch over to something with full-text search (like elasticsearch). This is a daunting task that will take me a while to get to.

This is also coming from a purely ignorant place; I have no idea if this would be worth doing.

However, in the meantime i think it'd be worthwhile to convert nfldb to python3. I dont personally use nfldb so i've not bothered to fork it yet. If you are serious about this lets get it going.

derek-adair commented 5 years ago

Feel free to submit pulls to https://github.com/derek-adair/nfldb - I will set up a pypi index and we can get it live.

ghost commented 5 years ago

Feel free to submit pulls to https://github.com/derek-adair/nfldb - I will set up a pypi index and we can get it live.

Regarding a document storage database, the following converts zip archives to binary

nfldb -> py3 is on the radar. However, i'm currently considering swapping out the entire thing w/ something more "modern". The idea would be to remove the json.tar.gz storage in nflgame, and replace it with some sort of document storage database. I think it'd be REALLY FUCKING COOL to switch over to something with full-text search (like elasticsearch). This is a daunting task that will take me a while to get to.

This is also coming from a purely ignorant place; I have no idea if this would be worth doing.

However, in the meantime i think it'd be worthwhile to convert nfldb to python3. I dont personally use nfldb so i've not bothered to fork it yet. If you are serious about this lets get it going.

In regards to the idea of a document storage database, you can convert .zip archives to BLOB data and store it in a sqlite3 database essentially serving as a data store:

"""
#' Reading and Writing ZipFiles to SQLite3 Database
#' Sqlite3 is the python module that creates a self contained, server-less,
#' zero-configuration, and transactional SQL database. 
#' The database file format is cross-platform and can be freely copied 
#' between 32-bit and 64-bit systems.
#'
"""

# import required packages
import os
import sqlite3
from sqlite3 import Error

# zip archive containing various files (.txt, .json, .xlsx etc.)
zip_file = r"C:\temp\pyzipdb\temp.zip"  

# sqlite database to store various zip_files such as above
db_file = r"C:\temp\pyzipdb\db\temp.db"

def create_database(db_file):
    """ create a database connection to a SQLite"""
    try:
        conn = sqlite3.connect(db_file)
        print(sqlite3.version)
    except Error as e:
        print(e)
    finally:
        conn.close()

def create_connection(db_file):
    """ create a database connection to the SQLite database
        specified by db_file
    :param db_file: database_file
    :return: Connection object or None
    """
    try:
        conn = sqlite3.connect(db_file)
        return conn
    except Error as e:
        print(e)
    return None

def create_table(conn, create_table_sql):
    """ create a table from the create_table_sql statement
    :param conn: Connection object
    :param create_table_sql: CREATE TABLE statement
    :return:
    """
    try:
        cursor = conn.cursor()
        cursor.execute(create_table_sql)
    except Error as e:
        print(e)

def build_database(db_file = r"C:\sqlite\db\temp.db"):
    sql_create_projects_table = """ CREATE TABLE IF NOT EXISTS projects (
                                        id integer PRIMARY KEY,
                                        name text NOT NULL,
                                        begin_date text,
                                        end_date text
                                    ); """

    sql_create_tasks_table = """CREATE TABLE IF NOT EXISTS tasks (
                                    id integer PRIMARY KEY,
                                    name text NOT NULL,
                                    priority integer,
                                    status_id integer NOT NULL,
                                    project_id integer NOT NULL,
                                    begin_date text NOT NULL,
                                    end_date text NOT NULL,
                                    FOREIGN KEY (project_id) REFERENCES 
                                    projects (id)
                                );"""

    sql_create_data_table = """CREATE TABLE IF NOT EXISTS data (
                                    id integer PRIMARY KEY AUTOINCREMENT NOT 
                                    NULL,
                                    FileName text,
                                    Type text,
                                    File blob
                                );"""

    # create database and establish connection
    create_database(db_file)
    conn = create_connection(db_file)

    if conn is not None:
        # create projects table
        create_table(conn, sql_create_projects_table)
        # create tasks table
        create_table(conn, sql_create_tasks_table)
        # create data table
        create_table(conn, sql_create_data_table)
    else:
        print("Error! Cannot create database connection.")

build_database()

"""
Let's assume we want to store a zip file inside the database. To do this, 
open the reference to the file and save it as binary data. The data will be 
stored in a field called 'File' as created above.
"""
def read_zipfile(conn, zip_file):
    """ insert ZipFile into SQLite database table """
    with open(zip_file, "rb") as infile:
        blob = infile.read()
        cursor = conn.cursor()
        sql_insert_cmd  = """ INSERT INTO data(FileName,Type,File) 
                                VALUES(?, 'zip', ?); """
        cursor.execute(sql_insert_cmd, (zip_file, sqlite3.Binary(blob)))
        conn.commit()
        print("Record Count:", cursor.lastrowid)
        print("ZipFile successfully read.")

def write_zipfile(conn, zip_file):
    """ recreate zip file from database blob"""
    with open(zip_file, "wb") as outfile:
        cursor = conn.cursor()
        cursor.execute("SELECT * FROM data WHERE id = 1")
        blob = cursor.fetchone()
        outfile.write(blob[3])
        print("ZipFile successfully written.")

def query_sql_table(conn):
    cursor = conn.cursor()
    cursor.execute("SELECT * FROM data")
    rows = cursor.fetchall()
    for row in rows:
        print(row)

"""
Here, we open the zip file and using an INSERT command, put the data into the data table.
Next, we'll perform a SELECT statement to verify that the row exists in the 
table.
"""
"""
To get the Blob data out of the database, use the SELECT SQL statement and 
the open() to create a new zip file. The code above creates a new file called 
'output.zip' and writes the information from the blob field to the file. 
Then the script closes the cursor connection and database connection.
"""

def main():
    db_file = r"C:\sqlite\db\temp.db"
    zip_file = r"C:\sqlite\zip\temp.zip"
    build_database(db_file)
    conn = create_connection(db_file)
    read_zipfile(conn, zip_file)
    query_sql_table(conn)
    write_zipfile(conn, os.path.join(os.path.dirname(zip_file), "db.zip"))
    conn.close()

if __name__ == "__main__":
    main()

pyzipdb

Stores a wicked amount of data that can be quickly queried, completely portable and compatible cross-platform.

BurntSushi commented 5 years ago

@derek-adair I would definitely urge you to look more closely at how nfldb works. For this kind of data, a relational database is really the ideal solution, and that's why I chose PostgreSQL. Elasticsearch doesn't really make sense for this kind of data. nfldb in particular makes use of SQL to make a lot of query operations very fast. If anything, I'd probably expect the next iteration here to drop nflgame completely and move toward solidifying nfldb.

You're carrying the torch so it's of course your call, but I'm happy to provide more advice if you like.

derek-adair commented 5 years ago

I would definitely urge you to look more closely at how nfldb works.

Definitely intend to before making any decisions. I just wanted an excuse to implement search via elastic.

I'd probably expect the next iteration here to drop nflgame completely and move toward solidifying nfldb

Definitely a possibility and something i've thought about. At least several people are using nflgame w/o a database... its kinda cool to have access to all that data in a python package. Not sure if it being kinda cool warrants it affecting the design!

Combining nflgame+nfldb and having SQLite as an option is very attractive to me. SEEMS like an ORM that supports both SQLite/Postgres would be wise at that point tho. Thoughts @BurntSushi regarding an ORM?

BurntSushi commented 5 years ago

I've never liked ORMs for anything but trivial stuff. They obscure too much. I'd strongly urge you to read some of the SQL used inside nfldb. It is not easily encapsulated into a generic ORM. nfldb specifically leverages SQL to make queries fast. There is some fairly non-trivial stuff happening, like the use of triggers to maintain a materialized view.

Definitely intend to before making any decisions. I just wanted an excuse to implement search via elastic.

Yeah, speaking from someone who deals with both Elasticsearch and PostgreSQL in production environments, avoid Elasticsearch unless you absolutely need both its scale and information retrieval features. I don't think you need either here. Elasticsearch is not an easy thing to maintain.

derek-adair commented 5 years ago

I personally prefer to abstract away as much as possible! But i'm lazy... I can respect some good well-formed raw SQL, i'm just not proficient anymore after using tools like SQLAlchemy.

A good ORM allows for triggers;

mytable = Table(
    'mytable', metadata,
    Column('id', Integer, primary_key=True),
    Column('data', String(50))
)

trigger = DDL(
    "CREATE TRIGGER dt_ins BEFORE INSERT ON mytable "
    "FOR EACH ROW BEGIN SET NEW.data='ins'; END"
)

event.listen(
    mytable,
    'after_create',
    trigger.execute_if(dialect='postgresql')
)

Also how to leverage materialized views. Perhaps I will catalog all of the advanced SQL and see if it fits into SQLAlchemy. I'd hate to degrade performance in any way.

If i can swap to SQLAlchemy + maintain performance it seems like a win. Combining nflgame/db AND support both SQLite and postgres. MySQL would also be on the table there as well.

BurntSushi commented 5 years ago

Good luck then. ORMs have caused me nothing but pain in the past. There's typically too much abstraction. I would recommend spending some time reading nfldb internals.

derek-adair commented 5 years ago

Absolutely would love to not touch anything! thats the laziest of all! I look forward to re-familiarizing myself w/ raw sql. When i first saw nfldb I was like, "eww no ORM", but then I looked around and its very well organized and VERY well done.

BurntSushi / nflgame

Python 3 compatibility updates for 2019 Season #375