UTHealth-CCB / clamp-support

Clinical Language Annotation Modeling and Processing toolkit
http://clamp.uth.edu/
16 stars 2 forks source link

Is there a document that has the recommended number of training files need for a good corpus training? #56

Open abrac692 opened 4 years ago

abrac692 commented 4 years ago

Is there a document that has the recommended number of training files need for a good corpus training?

clampnlp commented 4 years ago

The number of training files based on how complex the pattern you want to recognize. If it is easy, it may only need dozens of files. On the other hand, if it is complicated, it usually need several hundred of files as the training corpus.