current12 / Stat-222-Project

2 stars 0 forks source link

NLP Features EDA #59

Closed ijyliu closed 2 months ago

ijyliu commented 3 months ago

Correlations

ijyliu commented 3 months ago

Reminder to update to using new dataset

# Limit to items in the finalized dataset
# list of files in '../../../Data/All_Data/All_Data_with_NLP_Features'
import os
file_list = [f for f in os.listdir(r'../../../Data/All_Data/All_Data_with_NLP_Features') if f.endswith('.parquet')]
# read in all parquet files
df = pd.concat([pd.read_parquet(r'../../../Data/All_Data/All_Data_with_NLP_Features/' + f) for f in file_list])
ijyliu commented 3 months ago

@seanzhou1207 to clean up plots and save to Output folder

ijyliu commented 3 months ago

(and update to new dataset)

ijyliu commented 2 months ago

current level of eda is good, though it will need to be rerun in future