Closed notofir closed 10 months ago
Suggested columns by @dudubur:
["prediction_summary", "color", "predicted_class"]
Resolved by:
df = df.drop(columns=['label'])
df["color"] = df["color"].astype("category")
df["predicted_class"] = df["predicted_class"].astype("category")
unique_strings = list(set(key for row in df['prediction_summary'] for key in row))
string_to_int_mapping = {string: idx for idx, string in enumerate(unique_strings)}
int_to_string_mapping = {idx: string for string, idx in string_to_int_mapping.items()}
df['prediction_summary'] = df['prediction_summary'].apply(lambda d: {string_to_int_mapping[key]: value for key, value in d.items()})
int_to_string_mapping = {value: key for key, value in string_to_int_mapping.items()}
with open('prediction_summary_key_encoding.pkl', 'wb') as f:
pickle.dump(int_to_string, f)
@notofir
Show me what you've got...
memory_usage = df.memory_usage(deep=True) / 1024 / 1024
I'm sorry but the complex part didn't really work. It's problematic that the prediction_summary column is dict. I'm exporting this to a different df.
df["color"] = df["color"].astype("category")
df["predicted_class"] = df["predicted_class"].astype("category")
new_df = df['word', 'prediction_summary'].copy()
new_df = new_df.reset_index(drop=True)
new_df.to_pickle("prediction_summary.pkl")
df = df.drop(columns=["label", "prediction_summary"])
df = df.reset_index(drop=True)
df.to_pickle("model_data.pkl")
Resolved by #320.
See size of columns by running
memory_usage = df.memory_usage(deep=True) / 1024 / 1024