Open shanley10 opened 3 years ago
What I've found is that columns FG%, 3P%, 2P%, eFG% and FT% are not properly being recognized so if you change each of those columns to float, that should take care of this error. You can run a for loop to essentially do the following for all the columns that has the % sign
raw['FG%'] = raw['FG%'].astype(float)
Hope that helps!
What I've found is that columns FG%, 3P%, 2P%, eFG% and FT% are not properly being recognized so if you change each of those columns to float, that should take care of this error. You can run a for loop to essentially do the following for all the columns that has the % sign
raw['FG%'] = raw['FG%'].astype(float)
Hope that helps!
I am experiencing the same issue. Where exactly are you adding that for loop?
Thanks
I came across another thread that shows this as a bug. You can easily resolve this by adding:
[global]
dataFrameSerialization = "legacy"
To your ~/.streamlit/config.toml file.
It seems that doing this doesnt allow the sort function to work as expected though unfortunately. When I sort by pts the highest float does not go to the top or bottom. It is somewhere in the middle instead
Hope it is useful to someone. Complete code with the fixes for the error and also using the latest streamlit version.
❯ streamlit --version Streamlit, version 1.3.0
import streamlit as st
import pandas as pd
import base64
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
st.title("NBA Player Stats Explorer")
st.markdown(
"""
This app performs simple webscraping of NBA player stats data!
* **Python libraries:** base64, pandas, streamlit
* **Data source:** [Basketball-reference.com](https://www.basketball-reference.com/).
"""
)
st.sidebar.header("User Input Features")
selected_year = st.sidebar.selectbox("Year", list(reversed(range(1950, 2022))))
# Web scraping of NBA player stats
@st.cache
def load_data(year):
url = f"https://www.basketball-reference.com/leagues/NBA_{year}_per_game.html"
# "https://www.basketball-reference.com/leagues/NBA_2021_per_game.html"
html = pd.read_html(url, header=0)
df = html[0]
raw = df.drop(df[df.Age == "Age"].index) # Deletes repeating headers in content
# Set the type of each column to str to address issues like below.
# streamlit.errors.StreamlitAPIException: (
# "Expected bytes, got a 'int' object", 'Conversion failed for column FG% with type object')
raw = raw.astype(str)
raw = raw.fillna(0)
player_stats = raw.drop(["Rk"], axis=1)
return player_stats
player_stats = load_data(selected_year)
# Sidebar - Team selection
sorted_unique_team = sorted(player_stats.Tm.unique())
selected_team = st.sidebar.multiselect("Team", sorted_unique_team, sorted_unique_team)
# Sidebar - Position selection
unique_pos = ["C", "PF", "SF", "PG", "SG"]
selected_pos = st.sidebar.multiselect("Position", unique_pos, unique_pos)
# Filtering data
df_selected_team = player_stats[
(player_stats.Tm.isin(selected_team)) & (player_stats.Pos.isin(selected_pos))
]
st.header("Display Player Stats of Selected Team(s)")
st.write(
"Data Dimension: "
+ str(df_selected_team.shape[0])
+ " rows and "
+ str(df_selected_team.shape[1])
+ " columns."
)
df_selected_team = df_selected_team.astype(str)
st.dataframe(df_selected_team)
# Download NBA player stats data
# https://discuss.streamlit.io/t/how-to-download-file-in-streamlit/1806
def file_download(df):
csv = df.to_csv(index=False)
b64 = base64.b64encode(csv.encode()).decode() # strings <-> bytes conversions
href = f'<a href="data:file/csv;base64,{b64}" download="playerstats.csv">Download CSV File</a>'
return href
st.markdown(file_download(df_selected_team), unsafe_allow_html=True)
# Heatmap
if st.button("Intercorrelation Heatmap"):
st.header("Intercorrelation Matrix Heatmap")
df_selected_team.to_csv("output.csv", index=False)
df = pd.read_csv("output.csv")
corr = df.corr()
mask = np.zeros_like(corr)
mask[np.triu_indices_from(mask)] = True
with sns.axes_style("white"):
f, ax = plt.subplots(figsize=(7, 5))
ax = sns.heatmap(corr, mask=mask, vmax=1, square=True)
st.pyplot(f)
What I've found is that columns FG%, 3P%, 2P%, eFG% and FT% are not properly being recognized so if you change each of those columns to float, that should take care of this error. You can run a for loop to essentially do the following for all the columns that has the % sign
raw['FG%'] = raw['FG%'].astype(float)
Hope that helps!
That did the trick!
Here is the for loop version:
raw = raw.fillna(0)
# Convert % columns to float
columns = [
'FG%',
'3P%',
'2P%',
'eFG%',
'FT%'
]
for column in columns:
raw[column] = raw[column].astype(float)
playerstats = raw.drop(['Rk'], axis = 1)
Note: The preceding and following line are included for clarity.
I am having trouble replicating the code and keep getting this error. The code is below
import streamlit as st import pandas as pd import base64 import matplotlib.pyplot as plt import seaborn as sns import numpy as np
st.title('NBA Player Stats Explorer')
st.markdown(""" This app performs simple webscraping of NBA player stats data!
st.sidebar.header('User Input Features') selected_year = st.sidebar.selectbox('Year', list(reversed(range(1950,2020))))
Web scraping of NBA player stats
year = 2020
@st.cache def loaddata(year): url = "https://www.basketball-reference.com/leagues/NBA" + str(year) + "_per_game.html" html = pd.read_html(url, header = 0) df = html[0] raw = df.drop(df[df.Age == 'Age'].index) # Deletes repeating headers in content raw = raw.fillna(0) playerstats = raw.drop(['Rk'], axis=1) playerstats.columns = [i.replace('%', '_percent') for i in playerstats.columns] for i in playerstats.filter(regex='percent').columns: for i in playerstats.filter(regex='%').columns: playerstats[i] = playerstats[i].astype(float) return playerstats playerstats = load_data(selected_year)
Sidebar - Team selection
sorted_unique_team = sorted(playerstats.Tm.unique()) selected_team = st.sidebar.multiselect('Team', sorted_unique_team, sorted_unique_team)
Sidebar - Position selection
unique_pos = ['C','PF','SF','PG','SG'] selected_pos = st.sidebar.multiselect('Position', unique_pos, unique_pos)
Filtering data
df_selected_team = playerstats[(playerstats.Tm.isin(selected_team)) & (playerstats.Pos.isin(selected_pos))]
st.header('Display Player Stats of Selected Team(s)') st.write('Data Dimension: ' + str(df_selected_team.shape[0]) + ' rows and ' + str(df_selected_team.shape[1]) + ' columns.') st.dataframe(df_selected_team)
Download NBA player stats data
https://discuss.streamlit.io/t/how-to-download-file-in-streamlit/1806
def filedownload(df): csv = df.to_csv(index=False) b64 = base64.b64encode(csv.encode()).decode() # strings <-> bytes conversions href = f'Download CSV File' return href
st.markdown(filedownload(df_selected_team), unsafe_allow_html=True)
Heatmap
if st.button('Intercorrelation Heatmap'): st.header('Intercorrelation Matrix Heatmap') df_selected_team.to_csv('output.csv',index=False) df = pd.read_csv('output.csv')