datactive / bigbang

Scientific analysis of collaborative communities
http://datactive.github.io/bigbang/
MIT License
148 stars 52 forks source link

collect-mail doesn't display output #588

Open micahflee opened 1 year ago

micahflee commented 1 year ago

When you run bigbang collect-mail and pass in a URL, it displays a list of archives in the terminal but that's it. It's not clear that it's actually downloading the articles into the archives folder, though it actually is. It looks like there are various logging.info() calls with output, but they don't get displayed in the terminal.

For example:

$ bigbang collect-mail --url https://www.ietf.org/mail-archive/text/tls-reg-review/
['2018-06.mail',
 '2018-07.mail',
 '2018-08.mail',
 '2018-11.mail',
 '2018-12.mail',
 '2019-01.mail',
 '2019-02.mail',
 '2019-03.mail',
 '2019-04.mail',
 '2019-05.mail',
 '2019-06.mail',
 '2019-07.mail',
 '2019-08.mail',
 '2019-09.mail',
 '2019-10.mail',
 '2019-12.mail',
 '2020-01.mail',
 '2020-02.mail',
 '2020-03.mail',
 '2020-04.mail',
 '2020-05.mail',
 '2020-06.mail',
 '2020-07.mail',
 '2020-08.mail',
 '2020-09.mail',
 '2020-10.mail',
 '2020-11.mail',
 '2020-12.mail',
 '2021-01.mail',
 '2021-02.mail',
 '2021-03.mail',
 '2021-04.mail',
 '2021-05.mail',
 '2021-06.mail',
 '2021-07.mail',
 '2021-08.mail',
 '2021-09.mail',
 '2021-10.mail',
 '2021-12.mail',
 '2022-01.mail',
 '2022-02.mail',
 '2022-03.mail']

This script immediately displays all of these .mail filenames, and then just sits there for over a minute. But in the background, it's actually scraping the mailing list. It would be helpful if the output was nicer, and if it showed progress while you were running the script.