Closed nllz closed 3 years ago
Appears to be an infinite loop problem (a side effect of our not being careful enough about what is a URL to collect archives from and what is the name of a mailing list, I suspect). I don't think that CSV generation is necessary (mail archives can more easily be loaded straight from mbox or something similar), but the infinite loop where it tries to open a file and collect from the Web will be blocking.
I think the YamlLoadWarning is unrelated, although it is probably an indicator of an issue that should be fixed (low priority).
@nllz, if you're around, do you have the exact arguments you passed that triggered this infinite recursion? That might speed up our debugging.
problem is that most notebooks now depend on the csv's, so we kinda need them :)
I get the error when I collect any mailinglists, so both this:
python3 bin/collect_mail.py -f examples/url_collections/mm.ietf.org.txt
and this:
python3 bin/collect_py -u https://www.ietf.org/mail-archive/text/ietf/
produces an error. Same with ICANN mailinglists. All with a fresh install.
After collecting all mail of one list, the script tries to read the archive again infinitely without exiting before finally crashing with the following error https://gist.github.com/nllz/30c987f17b89ac2afd4380100c9a97f9
It crashes without producing a csv file for the list.
During collection this error occurs: /home/gagarin/Data/bigbang/bigbang/mailman.py:262: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
This also causes collect_mail.py -f to crash and not progress to the next list in the file.