Handle unicode str while writing corpus metadata

diging / tethne

Python module for bibliographic network analysis.

http://diging.github.io/tethne/

GNU General Public License v3.0

81 stars 32 forks source link

Handle unicode str while writing corpus metadata #172

Closed hashknot closed 7 years ago

hashknot commented 7 years ago

Correctly export corpus metadata having non-ascii values to *_meta.csv file.

Comment out failing LDA model test (TETHNE-147).

erickpeirson commented 7 years ago

@hashknot Any reason why you added another dataset to the test suite? We have a few sample datasets, but I worry that the repo is already getting pretty large. Can you build this test without the huge DfR dump?

erickpeirson commented 7 years ago

@hashknot Other than that one formatting issue, and the size of the test dataset, looks great! For the test dataset, I'd suggest just paring down what you have -- reduce to a single document, maybe reduce to only a few dozen words within that document, etc.

hashknot commented 7 years ago

78952ba uses a minimal test WoS dataset instead of the DfR dataset, and fixes the formatting issue in the test file.

erickpeirson commented 7 years ago

@hashknot Awesome! If you're happy with this (looks great to me), go ahead and create a PR from bug/TETHNE-145 into develop.