datactive / bigbang

Scientific analysis of collaborative communities
http://datactive.github.io/bigbang/
MIT License
149 stars 52 forks source link

collect_mail.py from IETF collects empty .mail files #558

Open sbenthall opened 2 years ago

sbenthall commented 2 years ago

Something is quite wrong with the IETF data collection procees.

$ python bin/collect_mail.py -u https://www.ietf.org/mail-archive/text/dns-security/
['2008-05.mail',
 '2008-06.mail',
 '2008-07.mail',
 '2008-08.mail',
 '2008-09.mail',
 '2008-10.mail',
 '2008-11.mail',
 '2008-12.mail']

So far so good, but then:


archives/dns-security/2008-09.mail (END)

So no data is getting collected.

sbenthall commented 2 years ago

The mail collection script is downloading all the .mail files from this page:

https://www.ietf.org/mail-archive/text/dns-security/

But these .mail files are empty; the data is actually in the .txt files

sbenthall commented 2 years ago

This is likely related to the fact that dns-security is a deprecated working group

https://www.ietf.org/mail-archive/text/dns-security/dns-security.200003.txt