Bookworm-project / BookwormDB

Tools for text tokenization and encoding
MIT License
84 stars 12 forks source link

Timing notes #21

Closed bmschmidt closed 10 years ago

bmschmidt commented 12 years ago

Just to get some numbers down: with the LOC corpus (1.7m documents, which is very much shorter than it should be) the initial clean scan took about 11 hours.

bmschmidt commented 10 years ago

This is not an issue.