datactive / bigbang

Scientific analysis of collaborative communities
http://datactive.github.io/bigbang/
MIT License
149 stars 52 forks source link

Fix Listserv 16.5 scarping error #476

Closed Christovis closed 3 years ago

Christovis commented 3 years ago

This PR implements a simply (based on the fact that the root problem couldn't be identified yet) solution for the bug mentioned in #472. Merging it none the less is motivated by the fact that is fixes the scraping error, such that bigbang/listserv.py can now uninterruptedly scrape the archives (at time of writing since 3 days).

The issue is in the function ListservMessageParser._get_header_from_html() as soup.find("b", text=re.compile(r"^\bSubject\b")) sometimes returns None and more often not, while there is no clear explain to why it happens... yet.

codecov-commenter commented 3 years ago

Codecov Report

Merging #476 (ffea832) into main (62367fe) will decrease coverage by 0.04%. The diff coverage is 83.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #476      +/-   ##
==========================================
- Coverage   75.13%   75.08%   -0.05%     
==========================================
  Files          16       16              
  Lines        2284     2288       +4     
==========================================
+ Hits         1716     1718       +2     
- Misses        568      570       +2     
Flag Coverage Δ
unittests 75.08% <83.33%> (-0.05%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
bigbang/listserv.py 81.95% <83.33%> (-0.25%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 62367fe...ffea832. Read the comment docs.