Document how to ingest from IETF Mailman 2

datactive / bigbang

Scientific analysis of collaborative communities

http://datactive.github.io/bigbang/

MIT License

152 stars 52 forks source link

Document how to ingest from IETF Mailman 2 #599

Closed sbenthall closed 12 months ago

sbenthall commented 1 year ago

IETF has switched to Mailman 2 for its mailing list archives.

This makes the URL-based scraper ineffective for mailing lists that are active after 2022.

We need better documentation on how to use BigBang with the new mailing list archive.

The quickest route is

log in to the IETF archives and download the mboxes
ingest from the mboxes

sbenthall commented 1 year ago

There may be a way to get the mbox directly through HTTP with a valid username/password, which would certaintly make a collect-mail script work more easily.

https://mail.python.org/archives/list/mailman-users@python.org/thread/QYWDGQ4TNO2SJAH4MDDYIOPTKWR74E7B/

sbenthall commented 1 year ago

There is already a Mailmain 2 ingestor that pulls data into a MongoDB database in the glasgow-IPL ietfdata package.

We might wrap/use this: https://github.com/glasgow-ipl/ietfdata/blob/cc78853f3dc8435c1cbe2711e38ead32354eec17/ietfdata/mailarchive2.py