datactive / bigbang

Scientific analysis of collaborative communities
http://datactive.github.io/bigbang/
MIT License
149 stars 52 forks source link

Collectmail for listserv #456

Closed Christovis closed 3 years ago

Christovis commented 3 years ago

This PR is addressed to #419 and adds the option to scrape Listserv 16.5 archives through the python bin/collect_mail.py -f your_urls.txt command. The scraped lists are saved in CONFIG.mail_path + name, where name identifies the archive (e.g. 3GPP, IEEE, ...), as .mbox files. Furthermore I changed yaml.load() to yaml.safe_load() as the former is deprecated.

I have not included new tests in this PR as both bigbang/mailman.py and bigbang/listserv.py have a test coverage and testing e.g. mailman.collect_from_file() would required a server connection and would slow the testing down.