datactive / bigbang

Scientific analysis of collaborative communities
http://datactive.github.io/bigbang/
MIT License
149 stars 52 forks source link

w3c scraping hangs #538

Closed sbenthall closed 2 years ago

sbenthall commented 2 years ago

Collecting w3c data like this:

$ python bin/collect_mail.py -u http://lists.w3.org/Archives/Public/public-privacy/

Hangs with no feedback to the user.

This is problematic because some notebooks in the examples/ directory depend on this data, like:

https://github.com/datactive/bigbang/blob/main/examples/experimental_notebooks/Analyze%20Senders.ipynb

Christovis commented 2 years ago

If @npdoty is ok with the refactored w3c code in PR #534 (which also includes tests for w3c ingress) I can take care of this issue.

Christovis commented 2 years ago

PR #534 is merged now which means that W3C obtained a make over which solves this issue. You can check the ingress tests and docstring for information on how to use it to ingress the mailing lists. Notebooks will be affected by this change and might fail. But this can be addressed in a different issue.