TeangaNLP / teanga2

Teanga a dó
Apache License 2.0
0 stars 0 forks source link

Performance issue with large number of docs #31

Open jmccrae opened 2 months ago

jmccrae commented 2 months ago

Repeatedly calling add_doc leads to very poor performance due to check for non-duplicate document ID.

This seems to be due to Corpus.doc_ids being regenerated every time it is called.