csaftoiu / yahoo-groups-backup

A python script to backup the contents of private Yahoo! groups.
The Unlicense
37 stars 18 forks source link

AssertionError in scraper.py, line 137 #38

Closed jkotlinski closed 7 years ago

jkotlinski commented 7 years ago
Failed to process message:
{'authorName': '<David Dineen-Porter>David Dineen-Porter<',
 'canDelete': False,
 'contentTrasformed': False,
 'from': 'David Dineen-Porter',
 'headers': {'inReplyToHeader': 'PDQ5QzBCMTFELjcwODA5MDhAZ21haWwuY29tPg==',
             'messageIdInHeader': 'PDAwZGUwMWM5YTdiOCRmODRiNWY0MCRlOGUyMWRjMCRAY29tPg==',
             'referencesHeader': 'PDQ5QzBCMTFELjcwODA5MDhAZ21haWwuY29tPg=='},
 'messageBody': '<div id="ygrps-yiv-434767221">Blood pours out of the sync '
                'cable and the screen displays a rune, signifying<br/>\n'
                'the return of the lord Loki.<br/>\n'
                '<br/>\n'
                ' <br/>\n'
                '<br/>\n'
                'Hail to Loki, lord of midi sync start/stop messages and the '
                'general DIN5<br/>\n'
                'cable usage!<br/>\n'
                '<br/>\n'
                ' <br/>\n'
                '<br/>\n'
                ' <br/>\n'
                '<br/>\n'
                'From: <a rel="nofollow" target="_blank" '
                'href="mailto:lsdj@yahoogroups.com">lsdj@yahoogroups.com</a> '
                '[mailto:<a rel="nofollow" target="_blank" '
                'href="mailto:lsdj@yahoogroups.com">lsdj@yahoogroups.com</a>] '
                'On Behalf Of jacob<br/>\n'
                'sikker remin<br/>\n'
                'Sent: Wednesday, March 18, 2009 4:30 AM<br/>\n'
                'To: <a rel="nofollow" target="_blank" '
                'href="mailto:lsdj@yahoogroups.com">lsdj@yahoogroups.com</a><br/>\n'
                'Subject: [LSDj!] midi start behaviour<br/>\n'
                '<br/>\n'
                ' <br/>\n'
                '<br/>\n'
                'hi group,<br/>\n'
                'how does LSDJ react to stardard midi signal '
                '&quot;start&quot;?<br/>\n'
                'does it:<br/>\n'
                'A: start from the beginning of the song?<br/>\n'
                'or<br/>\n'
                'B: continue from the position that it is currently in?<br/>\n'
                'best,<br/>\n'
                'jacob<br/>\n'
                '<br/>\n'
                '-- <br/>\n'
                'bleep:<br/>\n'
                'www.campingsex.org<br/>\n'
                'www.mikrogalleriet.net<br/>\n'
                'www.8bitklubben.dk<br/>\n'
                'www.myspace.com/blissfullymediocre<br/>\n'
                '<br/>\n'
                '<br/>\n'
                '<br/>\n'
                '<br/>\n'
                '<br/>\n'
                '[Non-text portions of this message have been removed]</div>',
 'msgId': 10646,
 'msgSnippet': 'Blood pours out of the sync cable and the screen displays a '
               'rune, signifying the return of the lord Loki. Hail to Loki, '
               'lord of midi sync start/stop messages',
 'nextInTime': 10647,
 'nextInTopic': 10651,
 'numMessagesInTopic': 5,
 'postDate': 1237374122,
 'prevInTime': 10645,
 'prevInTopic': 10645,
 'profile': 'slorrin',
 'rawEmail': 'Return-Path: &lt;theddp@...&gt;\r\n'
             'X-Sender: theddp@...\r\n'
             'X-Apparently-To: lsdj@yahoogroups.com\r\n'
             'X-Received: (qmail 53166 invoked from network); 18 Mar 2009 '
             '11:02:12 -0000\r\n'
             'X-Received: from unknown (69.147.108.200)\n'
             '  by m3.grp.sp2.yahoo.com with QMQP; 18 Mar 2009 11:02:12 '
             '-0000\r\n'
             'X-Received: from unknown (HELO mail-qy0-f124.google.com) '
             '(209.85.221.124)\n'
             '  by mta1.grp.re1.yahoo.com with SMTP; 18 Mar 2009 11:02:12 '
             '-0000\r\n'
             'X-Received: by qyk30 with SMTP id 30so650857qyk.32\n'
             '        for &lt;lsdj@yahoogroups.com&gt;; Wed, 18 Mar 2009 '
             '04:02:12 -0700 (PDT)\r\n'
             'X-Received: by 10.224.73.143 with SMTP id '
             'q15mr1756766qaj.189.1237374132045;\n'
             '        Wed, 18 Mar 2009 04:02:12 -0700 (PDT)\r\n'
             'Return-Path: &lt;theddp@...&gt;\r\n'
             'X-Received: from ownerd26a0cf1f ([67.204.19.0])\n'
             '        by mx.google.com with ESMTPS id '
             '9sm46632yxs.6.2009.03.18.04.02.09\n'
             '        (version=SSLv3 cipher=RC4-MD5);\n'
             '        Wed, 18 Mar 2009 04:02:11 -0700 (PDT)\r\n'
             'To: &lt;lsdj@yahoogroups.com&gt;\r\n'
             'References: &lt;49C0B11D.7080908@...&gt;\r\n'
             'In-Reply-To: &lt;49C0B11D.7080908@...&gt;\r\n'
             'Date: Wed, 18 Mar 2009 07:02:02 -0400\r\n'
             'Message-ID: &lt;00de01c9a7b8$f84b5f40$e8e21dc0$@com&gt;\r\n'
             'MIME-Version: 1.0\r\n'
             'X-Mailer: Microsoft Office Outlook 12.0\r\n'
             'Thread-Index: AcmnpSptXwV5NpYhToKL60EpMI+NmQAE7Lig\r\n'
             'Content-Language: en-us\r\n'
             'X-Originating-IP: 209.85.221.124\r\n'
             'X-eGroups-Msg-Info: 1:12:0:0:0\r\n'
             'From: &quot;&lt;David Dineen-Porter&gt;David '
             'Dineen-Porter&lt;/David Dineen-Porter&gt;&quot; '
             '&lt;theddp@...&gt;\r\n'
             'Subject: RE: [LSDj!] midi start behaviour\r\n'
             'X-Yahoo-Group-Post: member; u=299797681; '
             'y=C8TuS-MeN0pr9o920pphQV6-NFVPyG8rbeCxO7G07_MRmw\r\n'
             'X-Yahoo-Profile: slorrin\r\n'
             'Content-Type: text/plain; charset=US-ASCII\r\n'
             'Content-Transfer-Encoding: 7bit\r\n'
             '\r\n'
             'Blood pours out of the sync cable and the screen displays a '
             'rune, signifying\n'
             'the return of the lord Loki.\n'
             '\n'
             ' \n'
             '\n'
             'Hail to Loki, lord of midi sync start/stop messages and the '
             'general DIN5\n'
             'cable usage!\n'
             '\n'
             ' \n'
             '\n'
             ' \n'
             '\n'
             'From: lsdj@yahoogroups.com [mailto:lsdj@yahoogroups.com] On '
             'Behalf Of jacob\n'
             'sikker remin\n'
             'Sent: Wednesday, March 18, 2009 4:30 AM\n'
             'To: lsdj@yahoogroups.com\n'
             'Subject: [LSDj!] midi start behaviour\n'
             '\n'
             ' \n'
             '\n'
             'hi group,\n'
             'how does LSDJ react to stardard midi signal &quot;start&quot;?\n'
             'does it:\n'
             'A: start from the beginning of the song?\n'
             'or\n'
             'B: continue from the position that it is currently in?\n'
             'best,\n'
             'jacob\n'
             '\n'
             '-- \n'
             'bleep:\n'
             'www.campingsex.org\n'
             'www.mikrogalleriet.net\n'
             'www.8bitklubben.dk\n'
             'www.myspace.com/blissfullymediocre\n'
             '\n'
             '\n'
             '\n'
             '\n'
             '\n'
             '[Non-text portions of this message have been removed]\n'
             '\n'
             '\n',
 'replyTo': 'LIST',
 'senderId': '-_sJVcy9coiDYEPkKv6dMNQx_wz9YskY2XDaCOlDS0wjBh9VZdx9-UgiM3VBdKZQ60agrwFWlj7L4LSviBjIePt2pFa4MAaMAa1XMWP15LC24YX9c4wIZwJYhnfSwidZHKXxWaFHl0zZs3pct4mxxbMIEULiRuVxAg',
 'spamInfo': {'isSpam': False, 'reason': '12'},
 'specialLinks': [],
 'subject': 'RE: [LSDj!] midi start behaviour',
 'systemMessage': False,
 'topicId': 10645,
 'userId': 299797681}
Traceback (most recent call last):
  File "yahoo-groups-backup.py", line 129, in <module>
    main()
  File "yahoo-groups-backup.py", line 125, in main
    arguments, cfg_args)
  File "yahoo-groups-backup.py", line 103, in invoke_subcommand
    return module.command(args)
  File "/Users/johank/yahoo-groups-backup/yahoo_groups_backup/subcommands/scrape_messages.py", line 50, in command
    msg = scraper.get_message(cur_message)
  File "/Users/johank/yahoo-groups-backup/yahoo_groups_backup/scraper.py", line 181, in get_message
    return self._massage_message(data)
  File "/Users/johank/yahoo-groups-backup/yahoo_groups_backup/scraper.py", line 137, in _massage_message
    assert not leftover.strip()
AssertionError
jkotlinski commented 7 years ago

Is there any hope that this bug will get fixed?

csaftoiu commented 7 years ago

Not in the near future - apologies :/. Roughly speaking, the message has some format the scraper doesn't expect - the fix would be to see what is different and account for it.

csaftoiu commented 7 years ago

Resolved by #39.