dhmit / gender_analysis

A toolkit for analyzing gendered language across sets of documents
BSD 3-Clause "New" or "Revised" License
11 stars 5 forks source link

Corpus init -- ignore CSV files in the ignored files warning #78

Closed ryaanahmed closed 5 years ago

ryaanahmed commented 5 years ago

e.g., in some tests we have this kind of thing:

>>> path = TEST_DATA_PATH / 'test_corpus'
>>> c = Corpus(path)
Warning: Some files were not loaded because they are not .txt files. If you would like to analyze the text in these files, convert these files to .txt and re-initiate the corpus.

This is a bit of a confusing warning in this case -- the only thing that's been skipped is the metadata csv file, which was in the directory. I think we should probably just ignore CSV files for the purposes of this warning.