Open youngant opened 2 years ago
This is a good point. This can be resolved with something like:
import ast
processed_data_train = read_csv('./data/processed/goodreads_books_train_processed.csv',
converters={'cleaned_descriptions':ast.literal_eval})
Alternatively, we could write our own function to read it in without needing the converter.
While I don't think this is generally good security practice, I'm fine with it.
The
"cleaned_descriptions"
column read from the cleaned CSV files contains a string representation of the python list that was saved. A CSV might not be the correct format for saving a list of lists with arbitrary lengths. We could probably either pickle the DataFrame or save a CSV with a column for each word (where descriptions would just have empty entries when they run out of words). Thoughts?