We don't care how many events are used so much as there are enough events to have entirely new events in the holdout set. And how long events are doesn't matter as long as the number of sentences per person * their average length is generally enough to train a model.
It may actually be better to have a 20% train set and an 80% holdout set so that we can even more so try to ensure the model performs well on new data.
We don't care how many events are used so much as there are enough events to have entirely new events in the holdout set. And how long events are doesn't matter as long as the number of sentences per person * their average length is generally enough to train a model.
It may actually be better to have a 20% train set and an 80% holdout set so that we can even more so try to ensure the model performs well on new data.
https://github.com/CouncilDataProject/speakerbox/issues/6