DistrictDataLabs / baleen

An automated ingestion service for blogs to construct a corpus for NLP research.
MIT License
86 stars 38 forks source link

Issue #87 Make sanitization happen in Post.htmlize() #88

Closed janetriley closed 7 years ago

janetriley commented 7 years ago
coveralls commented 7 years ago

Coverage Status

Coverage increased (+0.2%) to 71.165% when pulling 1507ffcc0a9ce9c6582c79714bb665e3f260b267 on janetriley:feature-87_extract_html_sanitizer into b89caa254643e54881ab0478a9bf6265473e938c on DistrictDataLabs:develop.

coveralls commented 7 years ago

Coverage Status

Coverage increased (+0.09%) to 71.027% when pulling bfafd43eb3ad6a6d79a094ca9f6c4c55555b57f2 on janetriley:feature-87_extract_html_sanitizer into b89caa254643e54881ab0478a9bf6265473e938c on DistrictDataLabs:develop.

coveralls commented 7 years ago

Coverage Status

Coverage increased (+0.09%) to 71.027% when pulling bfafd43eb3ad6a6d79a094ca9f6c4c55555b57f2 on janetriley:feature-87_extract_html_sanitizer into b89caa254643e54881ab0478a9bf6265473e938c on DistrictDataLabs:develop.

coveralls commented 7 years ago

Coverage Status

Coverage increased (+0.09%) to 71.027% when pulling 00415a39794aa3090317c7772d73b03bb614aa7f on janetriley:feature-87_extract_html_sanitizer into b89caa254643e54881ab0478a9bf6265473e938c on DistrictDataLabs:develop.

janetriley commented 7 years ago

@bbengfort, a couple questions: text_utils.sanitize_html() raises a ValueError rather than ExportError. Should the export function catch that and re-raise it as an ExportError?

How do you generate the ID numbers at the opening comments, like line 10 in utils/timez.py?

coveralls commented 7 years ago

Coverage Status

Coverage increased (+0.09%) to 71.027% when pulling 6eb8e584524c9407a373a7806f565693d6f3f45d on janetriley:feature-87_extract_html_sanitizer into b89caa254643e54881ab0478a9bf6265473e938c on DistrictDataLabs:develop.