adam-broussard / bestreads

2 stars 1 forks source link

Fix saving of cleaned descriptions #49

Open youngant opened 2 years ago

youngant commented 2 years ago

The "cleaned_descriptions" column read from the cleaned CSV files contains a string representation of the python list that was saved. A CSV might not be the correct format for saving a list of lists with arbitrary lengths. We could probably either pickle the DataFrame or save a CSV with a column for each word (where descriptions would just have empty entries when they run out of words). Thoughts?

adam-broussard commented 2 years ago

This is a good point. This can be resolved with something like:

import ast
processed_data_train = read_csv('./data/processed/goodreads_books_train_processed.csv',
                                converters={'cleaned_descriptions':ast.literal_eval})

Alternatively, we could write our own function to read it in without needing the converter.

youngant commented 2 years ago

While I don't think this is generally good security practice, I'm fine with it.