Some articles have two or more authors from the same faculty or department.
This is causing columns representing the same document to appear more than once in the term-document matrix
We should screen for this by removing duplicate columns (it is exceedingly unlikely that two documents will have exactly the same frequency distribution of words out of 10,000 words by chance)