datactive / bigbang

Scientific analysis of collaborative communities
http://datactive.github.io/bigbang/
MIT License
152 stars 52 forks source link

Document how to ingest from IETF Mailman 2 #599

Closed sbenthall closed 12 months ago

sbenthall commented 1 year ago

IETF has switched to Mailman 2 for its mailing list archives.

This makes the URL-based scraper ineffective for mailing lists that are active after 2022.

We need better documentation on how to use BigBang with the new mailing list archive.

The quickest route is

sbenthall commented 1 year ago

There may be a way to get the mbox directly through HTTP with a valid username/password, which would certaintly make a collect-mail script work more easily.

https://mail.python.org/archives/list/mailman-users@python.org/thread/QYWDGQ4TNO2SJAH4MDDYIOPTKWR74E7B/

sbenthall commented 1 year ago

There is already a Mailmain 2 ingestor that pulls data into a MongoDB database in the glasgow-IPL ietfdata package.

We might wrap/use this: https://github.com/glasgow-ipl/ietfdata/blob/cc78853f3dc8435c1cbe2711e38ead32354eec17/ietfdata/mailarchive2.py