Write function to check the basics of the data (splits and amount)

We don't care how many events are used so much as there are enough events to have entirely new events in the holdout set. And how long events are doesn't matter as long as the number of sentences per person * their average length is generally enough to train a model.

It may actually be better to have a 20% train set and an 80% holdout set so that we can even more so try to ensure the model performs well on new data.

https://github.com/CouncilDataProject/speakerbox/issues/6

WeberLab-UW / project-tracking

Write function to check the basics of the data (splits and amount) #38