WeberLab-UW / project-tracking

Dummy repo for general project tracking
0 stars 0 forks source link

Write function to check the basics of the data (splits and amount) #38

Closed evamaxfield closed 2 years ago

evamaxfield commented 2 years ago

We don't care how many events are used so much as there are enough events to have entirely new events in the holdout set. And how long events are doesn't matter as long as the number of sentences per person * their average length is generally enough to train a model.

It may actually be better to have a 20% train set and an 80% holdout set so that we can even more so try to ensure the model performs well on new data.

https://github.com/CouncilDataProject/speakerbox/issues/6