Bookworm-project / BookwormDB

Tools for text tokenization and encoding
MIT License
84 stars 12 forks source link

Master wordlist produces duplicates. #15

Closed bmschmidt closed 12 years ago

bmschmidt commented 12 years ago

In the LOC build, I'm getting duplicates of certain words in the master wordlist. I suspect something is going wrong with the merge code in CreateWordlist.py. That code is highly unoptimized, so might be due for a rewrite in any case. But there should be a simple fix, too.