Closed ijyliu closed 2 months ago
Reminder to update to using new dataset
# Limit to items in the finalized dataset
# list of files in '../../../Data/All_Data/All_Data_with_NLP_Features'
import os
file_list = [f for f in os.listdir(r'../../../Data/All_Data/All_Data_with_NLP_Features') if f.endswith('.parquet')]
# read in all parquet files
df = pd.concat([pd.read_parquet(r'../../../Data/All_Data/All_Data_with_NLP_Features/' + f) for f in file_list])
@seanzhou1207 to clean up plots and save to Output folder
(and update to new dataset)
current level of eda is good, though it will need to be rerun in future
Correlations