Closed nllz closed 3 years ago
example of tzdata: python3 bin/collect_mail.py -u https://ietf.org/mail-archive/text/ietf/ [...]
INFO:root:200 - writing file to /home/ubuntu/bigbang/archives/ietf/2021-02.mail
INFO:root:retrieving https://ietf.org/mail-archive/text/ietf/2021-03.mail
INFO:root:200 - writing file to /home/ubuntu/bigbang/archives/ietf/2021-03.mail
/home/ubuntu/bigbang/bigbang/mailman.py:262: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
provenance = yaml.load(file_handle)
INFO:root:Updated provenance file in /home/ubuntu/bigbang/archives/ietf
INFO:root:Unzipping 0 archive files
INFO:root:Opening 272 archive files
/home/ubuntu/anaconda3/envs/bigbang/lib/python3.7/site-packages/dateutil/parser/_parser.py:1218: UnknownTimezoneWarning: tzname WET identified but not understood. Pass tzinfos
argument in order to correctly return a timezone-aware datetime. In a future version, this will raise an exception.
category=UnknownTimezoneWarning)
/home/ubuntu/anaconda3/envs/bigbang/lib/python3.7/site-packages/dateutil/parser/_parser.py:1218: UnknownTimezoneWarning: tzname EDT identified but not understood. Pass tzinfos
argument in order to correctly return a timezone-aware datetime. In a future version, this will raise an exception.
category=UnknownTimezoneWarning)
/home/ubuntu/anaconda3/envs/bigbang/lib/python3.7/site-packages/dateutil/parser/_parser.py:1218: UnknownTimezoneWarning: tzname PST identified but not understood. Pass tzinfos
argument in order to correctly return a timezone-aware datetime. In a future version, this will raise an exception.
category=UnknownTimezoneWarning)
/home/ubuntu/anaconda3/envs/bigbang/lib/python3.7/site-packages/dateutil/parser/_parser.py:1218: UnknownTimezoneWarning: tzname CDT identified but not understood. Pass tzinfos
argument in order to correctly return a timezone-aware datetime. In a future version, this will raise an exception.
category=UnknownTimezoneWarning)
/home/ubuntu/anaconda3/envs/bigbang/lib/python3.7/site-packages/dateutil/parser/_parser.py:1218: UnknownTimezoneWarning: tzname EST identified but not understood. Pass tzinfos
argument in order to correctly return a timezone-aware datetime. In a future version, this will raise an exception.
category=UnknownTimezoneWarning)
/home/ubuntu/anaconda3/envs/bigbang/lib/python3.7/site-packages/dateutil/parser/_parser.py:1218: UnknownTimezoneWarning: tzname JST identified but not understood. Pass tzinfos
argument in order to correctly return a timezone-aware datetime. In a future version, this will raise an exception.
category=UnknownTimezoneWarning)
/home/ubuntu/anaconda3/envs/bigbang/lib/python3.7/site-packages/dateutil/parser/_parser.py:1218: UnknownTimezoneWarning: tzname MDT identified but not understood. Pass tzinfos
argument in order to correctly return a timezone-aware datetime. In a future version, this will raise an exception.
category=UnknownTimezoneWarning)
/home/ubuntu/anaconda3/envs/bigbang/lib/python3.7/site-packages/dateutil/parser/_parser.py:1218: UnknownTimezoneWarning: tzname PDT identified but not understood. Pass tzinfos
argument in order to correctly return a timezone-aware datetime. In a future version, this will raise an exception.
category=UnknownTimezoneWarning)
/home/ubuntu/anaconda3/envs/bigbang/lib/python3.7/site-packages/dateutil/parser/_parser.py:1218: UnknownTimezoneWarning: tzname IST identified but not understood. Pass tzinfos
argument in order to correctly return a timezone-aware datetime. In a future version, this will raise an exception.
category=UnknownTimezoneWarning)
/home/ubuntu/anaconda3/envs/bigbang/lib/python3.7/site-packages/dateutil/parser/_parser.py:1218: UnknownTimezoneWarning: tzname UT identified but not understood. Pass tzinfos
argument in order to correctly return a timezone-aware datetime. In a future version, this will raise an exception.
category=UnknownTimezoneWarning)
/home/ubuntu/anaconda3/envs/bigbang/lib/python3.7/site-packages/dateutil/parser/_parser.py:1218: UnknownTimezoneWarning: tzname CET identified but not understood. Pass tzinfos
argument in order to correctly return a timezone-aware datetime. In a future version, this will raise an exception.
category=UnknownTimezoneWarning)
/home/ubuntu/anaconda3/envs/bigbang/lib/python3.7/site-packages/dateutil/parser/_parser.py:1218: UnknownTimezoneWarning: tzname MET identified but not understood. Pass tzinfos
argument in order to correctly return a timezone-aware datetime. In a future version, this will raise an exception.
category=UnknownTimezoneWarning)
/home/ubuntu/anaconda3/envs/bigbang/lib/python3.7/site-packages/dateutil/parser/_parser.py:1218: UnknownTimezoneWarning: tzname MST identified but not understood. Pass tzinfos
argument in order to correctly return a timezone-aware datetime. In a future version, this will raise an exception.
category=UnknownTimezoneWarning)
/home/ubuntu/anaconda3/envs/bigbang/lib/python3.7/site-packages/dateutil/parser/_parser.py:1218: UnknownTimezoneWarning: tzname CST identified but not understood. Pass tzinfos
argument in order to correctly return a timezone-aware datetime. In a future version, this will raise an exception.
category=UnknownTimezoneWarning)
/home/ubuntu/anaconda3/envs/bigbang/lib/python3.7/site-packages/dateutil/parser/_parser.py:1218: UnknownTimezoneWarning: tzname DST identified but not understood. Pass tzinfos
argument in order to correctly return a timezone-aware datetime. In a future version, this will raise an exception.
category=UnknownTimezoneWarning)
offset must be a timedelta strictly between -timedelta(hours=24) and timedelta(hours=24).
https://stackoverflow.com/questions/37890123/how-to-trap-an-exception-that-occurs-in-code-underlying-python-for-loop Key seems to be:
text = data.decode(encoding="utf-8", errors="replace")
Isn't it?
example of non-ascii character: python3 bin/collect_mail.py -u https://ietf.org/mail-archive/text/eap/ [...] INFO:root:200 - writing file to /home/ubuntu/bigbang/archives/eap/2012-06.mail INFO:root:retrieving https://ietf.org/mail-archive/text/eap/2012-07.mail INFO:root:200 - writing file to /home/ubuntu/bigbang/archives/eap/2012-07.mail /home/ubuntu/bigbang/bigbang/mailman.py:262: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. provenance = yaml.load(file_handle) INFO:root:Updated provenance file in /home/ubuntu/bigbang/archives/eap INFO:root:Unzipping 0 archive files INFO:root:Opening 126 archive files Traceback (most recent call last): File "bin/collect_mail.py", line 54, in
main(args)
File "bin/collect_mail.py", line 45, in main
mailman.collect_from_url(args.u, notes=notes)
File "/home/ubuntu/bigbang/bigbang/mailman.py", line 112, in collect_from_url
data = open_list_archives(url)
File "/home/ubuntu/bigbang/bigbang/mailman.py", line 435, in open_list_archives
arch = [list(mailbox.mbox(txt, create=False).values()) for txt in txts]
File "/home/ubuntu/bigbang/bigbang/mailman.py", line 435, in
arch = [list(mailbox.mbox(txt, create=False).values()) for txt in txts]
File "/usr/lib/python3.8/mailbox.py", line 119, in values
return list(self.itervalues())
File "/usr/lib/python3.8/mailbox.py", line 109, in itervalues
value = self[key]
File "/usr/lib/python3.8/mailbox.py", line 73, in getitem
return self.get_message(key)
File "/usr/lib/python3.8/mailbox.py", line 781, in get_message
msg.set_from(from_line[5:].decode('ascii'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0x97 in position 25: ordinal not in range(128)