@samimak37 -- @sophiazhi, @meesuekim, and I were brainstorming about what the best way into the module would be, and we realized that we shouldn't always require the user to provide a metadata csv file. There a lot of our analysis tools that you can do with a corpus with no metadata.
So, we need a Corpus.__init__() that takes as args
path to a directory where the txt files live OR path to a pickled corpus (require one or another)
optionally the location of a metadata CSV file
a bool of whether to pickle the corpus on loading or not (I'm not sure what the default should be yet... on the one hand these potentially big and slow loads, so pickling by default makes sense, but on the other hand I really don't like the idea of serializing and writing to disk without actually asking the user)
If the caller does not supply the location of a metadata CSV file, we'll construct a metadata dict ourselves, with only a 'filename' key.
@samimak37 -- @sophiazhi, @meesuekim, and I were brainstorming about what the best way into the module would be, and we realized that we shouldn't always require the user to provide a metadata csv file. There a lot of our analysis tools that you can do with a corpus with no metadata.
So, we need a
Corpus.__init__()
that takes as argsIf the caller does not supply the location of a metadata CSV file, we'll construct a metadata dict ourselves, with only a 'filename' key.