Crash when parsing https://www.centerforsecuritypolicy.org/1994/07/25/hardy-perennial-tim-weiner-senate-allies-conjure-up-false-charges-again-in-hopes-of-making-more-cuts-in-sdi-2/
Traceback (most recent call last):
File "populate_article_db.py", line 71, in <module>
main()
File "populate_article_db.py", line 67, in main
use_local=args.local)
File "/Users/jrobinson/Projects/misinformation/misinformation-crawler/misinformation/warc/warc_parser.py", line 131, in process_webpages
article = extract_article(response, config, entry, self.content_digests, self.node_indexes)
File "/Users/jrobinson/Projects/misinformation/misinformation-crawler/misinformation/extractors/extract_article.py", line 42, in extract_article
default_readability_article = simple_json_from_html_string(page_html, content_digests, node_indexes, use_readability=False)
File "/Users/jrobinson/Projects/misinformation/misinformation-crawler/ReadabiliPy/readabilipy/simple_json.py", line 34, in simple_json_from_html_string
"content": str(simple_tree_from_html_string(html))
File "/Users/jrobinson/Projects/misinformation/misinformation-crawler/ReadabiliPy/readabilipy/simple_tree.py", line 27, in simple_tree_from_html_string
process_special_elements(soup)
File "/Users/jrobinson/Projects/misinformation/misinformation-crawler/ReadabiliPy/readabilipy/simplifiers/html.py", line 123, in process_special_elements
element.unwrap()
File "/Users/jrobinson/.pyenv/versions/misinformation/lib/python3.7/site-packages/bs4/element.py", line 307, in unwrap
"Cannot replace an element with its contents when that"
ValueError: Cannot replace an element with its contents when thatelement is not part of a tree.
Crash when parsing
https://www.centerforsecuritypolicy.org/1994/07/25/hardy-perennial-tim-weiner-senate-allies-conjure-up-false-charges-again-in-hopes-of-making-more-cuts-in-sdi-2/