Joshua099 / R6Siege_DataAnalysisProject

Data Analysis Project on player statistics from r6 leaderboards
1 stars 0 forks source link

Combine all leaderboard data into one table as well #5

Closed Joshua099 closed 5 years ago

Joshua099 commented 5 years ago

ISSUE: the different leader board tables contain the same amount of columns and data types. BUT they are in a different order.

Joshua099 commented 5 years ago

The issue is that each region's leader board has a separately labeled p_REGION_currentmmr field. In the rainbow six database, this is actually a global mmr. Also, in each region: p_REGION_currentmmr == p_currentmmr.

So to solve this issue, delete each leader board table's p_REGION_currentmmr

Joshua099 commented 5 years ago

UPDATE: Successfully dropped problem columns in a test run.

with engine.connect() as con:
    try:
        sql_delete_query = """ALTER TABLE %s DROP COLUMN %s"""
        con.execute(sql_delete_query % ('r6_leaderboard_database.na_test', 'p_NA_currentmmr'))

        print('delete done')
    except:
        print("Already Deleted")

    con.close()

Ready to drop columns on each leader board table

Joshua099 commented 5 years ago

UPDATE: Dropped all problem columns with code:

with engine.connect() as con:
    try:
        sql_delete_query = """ALTER TABLE %s DROP COLUMN %s"""
        con.execute(sql_delete_query % ('r6_leaderboard_database.na_leaderboard', 'p_NA_currentmmr'))
        con.execute(sql_delete_query % ('r6_leaderboard_database.eu_leaderboard', 'p_EU_currentmmr'))
        con.execute(sql_delete_query % ('r6_leaderboard_database.as_leaderboard', 'p_AS_currentmmr'))

        print('delete done')
    except:
        print("Already Deleted")

    con.close()
Joshua099 commented 5 years ago

UPDATE: New PROBLEM

with engine.connect() as con:

    union_query = """SELECT * FROM %s UNION SELECT * FROM %s UNION SELECT * FROM %s"""

    test = con.execute(union_query % ('r6_leaderboard_database.na_leaderboard',
                                      'r6_leaderboard_database.eu_leaderboard',
                                      'r6_leaderboard_database.as_leaderboard'))

    for row in test:
        print(row)

    test_df = pd.read_sql_query(sql=union_query % ('r6_leaderboard_database.na_leaderboard',
                                             'r6_leaderboard_database.eu_leaderboard',
                                             'r6_leaderboard_database.as_leaderboard'),
                          con=con,
                          index_col=False,
                          chunksize=1000
                          )

    print(list(test_df))
Joshua099 commented 5 years ago

Iterable Dataframe problem SOLVED:

with engine.connect() as con:

    union_query = """SELECT * FROM %s UNION SELECT * FROM %s UNION SELECT * FROM %s"""

    test = con.execute(union_query % ('r6_leaderboard_database.na_leaderboard',
                                      'r6_leaderboard_database.eu_leaderboard',
                                      'r6_leaderboard_database.as_leaderboard'))

    # for row in test:
    #     print(row)
    #
    # test_df = pd.read_sql_query(sql=union_query % ('r6_leaderboard_database.na_leaderboard',
    #                                          'r6_leaderboard_database.eu_leaderboard',
    #                                          'r6_leaderboard_database.as_leaderboard'),
    #                       con=con,
    #                       index_col=False,
    #                       chunksize=1000
    #                       )
    #
    # print(list(test_df))

    df = pd.DataFrame(test.fetchall())
    df.columns = test.keys()

    print(df)
Joshua099 commented 5 years ago

Should create a table in sql server with this code