dataprofessor / streamlit_freecodecamp

Build 12 Data Apps in Python with Streamlit
592 stars 552 forks source link

ArrowTypeError: ("Expected bytes, got a 'int' object", 'Conversion failed for column FG_percent with type object') #8

Open shanley10 opened 3 years ago

shanley10 commented 3 years ago

I am having trouble replicating the code and keep getting this error. The code is below

import streamlit as st import pandas as pd import base64 import matplotlib.pyplot as plt import seaborn as sns import numpy as np

st.title('NBA Player Stats Explorer')

st.markdown(""" This app performs simple webscraping of NBA player stats data!

st.sidebar.header('User Input Features') selected_year = st.sidebar.selectbox('Year', list(reversed(range(1950,2020))))

Web scraping of NBA player stats

year = 2020

@st.cache def loaddata(year): url = "https://www.basketball-reference.com/leagues/NBA" + str(year) + "_per_game.html" html = pd.read_html(url, header = 0) df = html[0] raw = df.drop(df[df.Age == 'Age'].index) # Deletes repeating headers in content raw = raw.fillna(0) playerstats = raw.drop(['Rk'], axis=1) playerstats.columns = [i.replace('%', '_percent') for i in playerstats.columns] for i in playerstats.filter(regex='percent').columns: for i in playerstats.filter(regex='%').columns: playerstats[i] = playerstats[i].astype(float) return playerstats playerstats = load_data(selected_year)

Sidebar - Team selection

sorted_unique_team = sorted(playerstats.Tm.unique()) selected_team = st.sidebar.multiselect('Team', sorted_unique_team, sorted_unique_team)

Sidebar - Position selection

unique_pos = ['C','PF','SF','PG','SG'] selected_pos = st.sidebar.multiselect('Position', unique_pos, unique_pos)

Filtering data

df_selected_team = playerstats[(playerstats.Tm.isin(selected_team)) & (playerstats.Pos.isin(selected_pos))]

st.header('Display Player Stats of Selected Team(s)') st.write('Data Dimension: ' + str(df_selected_team.shape[0]) + ' rows and ' + str(df_selected_team.shape[1]) + ' columns.') st.dataframe(df_selected_team)

Download NBA player stats data

https://discuss.streamlit.io/t/how-to-download-file-in-streamlit/1806

def filedownload(df): csv = df.to_csv(index=False) b64 = base64.b64encode(csv.encode()).decode() # strings <-> bytes conversions href = f'Download CSV File' return href

st.markdown(filedownload(df_selected_team), unsafe_allow_html=True)

Heatmap

if st.button('Intercorrelation Heatmap'): st.header('Intercorrelation Matrix Heatmap') df_selected_team.to_csv('output.csv',index=False) df = pd.read_csv('output.csv')

corr = df.corr()
mask = np.zeros_like(corr)
mask[np.triu_indices_from(mask)] = True
with sns.axes_style("white"):
    f, ax = plt.subplots(figsize=(7, 5))
    ax = sns.heatmap(corr, mask=mask, vmax=1, square=True)
st.pyplot()
sGsamSg commented 2 years ago

What I've found is that columns FG%, 3P%, 2P%, eFG% and FT% are not properly being recognized so if you change each of those columns to float, that should take care of this error. You can run a for loop to essentially do the following for all the columns that has the % sign

raw['FG%'] = raw['FG%'].astype(float)

Hope that helps!

inzel commented 2 years ago

What I've found is that columns FG%, 3P%, 2P%, eFG% and FT% are not properly being recognized so if you change each of those columns to float, that should take care of this error. You can run a for loop to essentially do the following for all the columns that has the % sign

raw['FG%'] = raw['FG%'].astype(float)

Hope that helps!

I am experiencing the same issue. Where exactly are you adding that for loop?

Thanks

inzel commented 2 years ago

I came across another thread that shows this as a bug. You can easily resolve this by adding:

[global]
dataFrameSerialization = "legacy"

To your ~/.streamlit/config.toml file.

It seems that doing this doesnt allow the sort function to work as expected though unfortunately. When I sort by pts the highest float does not go to the top or bottom. It is somewhere in the middle instead

kranthigy commented 2 years ago

Hope it is useful to someone. Complete code with the fixes for the error and also using the latest streamlit version.

❯ streamlit --version Streamlit, version 1.3.0

import streamlit as st
import pandas as pd
import base64
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

st.title("NBA Player Stats Explorer")

st.markdown(
    """
This app performs simple webscraping of NBA player stats data!
* **Python libraries:** base64, pandas, streamlit
* **Data source:** [Basketball-reference.com](https://www.basketball-reference.com/).
"""
)

st.sidebar.header("User Input Features")
selected_year = st.sidebar.selectbox("Year", list(reversed(range(1950, 2022))))

# Web scraping of NBA player stats
@st.cache
def load_data(year):
    url = f"https://www.basketball-reference.com/leagues/NBA_{year}_per_game.html"
    # "https://www.basketball-reference.com/leagues/NBA_2021_per_game.html"
    html = pd.read_html(url, header=0)
    df = html[0]
    raw = df.drop(df[df.Age == "Age"].index)  # Deletes repeating headers in content

    # Set the type of each column to str to address issues like below.
    # streamlit.errors.StreamlitAPIException: (
    # "Expected bytes, got a 'int' object", 'Conversion failed for column FG% with type object')

    raw = raw.astype(str)
    raw = raw.fillna(0)

    player_stats = raw.drop(["Rk"], axis=1)
    return player_stats

player_stats = load_data(selected_year)

# Sidebar - Team selection
sorted_unique_team = sorted(player_stats.Tm.unique())
selected_team = st.sidebar.multiselect("Team", sorted_unique_team, sorted_unique_team)

# Sidebar - Position selection
unique_pos = ["C", "PF", "SF", "PG", "SG"]
selected_pos = st.sidebar.multiselect("Position", unique_pos, unique_pos)

# Filtering data
df_selected_team = player_stats[
    (player_stats.Tm.isin(selected_team)) & (player_stats.Pos.isin(selected_pos))
]

st.header("Display Player Stats of Selected Team(s)")
st.write(
    "Data Dimension: "
    + str(df_selected_team.shape[0])
    + " rows and "
    + str(df_selected_team.shape[1])
    + " columns."
)
df_selected_team = df_selected_team.astype(str)
st.dataframe(df_selected_team)

# Download NBA player stats data
# https://discuss.streamlit.io/t/how-to-download-file-in-streamlit/1806
def file_download(df):
    csv = df.to_csv(index=False)
    b64 = base64.b64encode(csv.encode()).decode()  # strings <-> bytes conversions
    href = f'<a href="data:file/csv;base64,{b64}" download="playerstats.csv">Download CSV File</a>'
    return href

st.markdown(file_download(df_selected_team), unsafe_allow_html=True)

# Heatmap
if st.button("Intercorrelation Heatmap"):
    st.header("Intercorrelation Matrix Heatmap")
    df_selected_team.to_csv("output.csv", index=False)
    df = pd.read_csv("output.csv")

    corr = df.corr()
    mask = np.zeros_like(corr)
    mask[np.triu_indices_from(mask)] = True
    with sns.axes_style("white"):
        f, ax = plt.subplots(figsize=(7, 5))
        ax = sns.heatmap(corr, mask=mask, vmax=1, square=True)
    st.pyplot(f)
datalifenyc commented 2 years ago

What I've found is that columns FG%, 3P%, 2P%, eFG% and FT% are not properly being recognized so if you change each of those columns to float, that should take care of this error. You can run a for loop to essentially do the following for all the columns that has the % sign

raw['FG%'] = raw['FG%'].astype(float)

Hope that helps!

That did the trick!

Here is the for loop version:

raw = raw.fillna(0)
# Convert % columns to float
columns = [
    'FG%', 
    '3P%',
    '2P%',
    'eFG%',
    'FT%'
]
for column in columns:
    raw[column] = raw[column].astype(float)
playerstats = raw.drop(['Rk'], axis = 1)

Note: The preceding and following line are included for clarity.