eksopl / fuuka

Fuuka Imageboard Archiver
http://code.google.com/p/fuuka/
Other
58 stars 7 forks source link

Some newhtml dumper error #46

Closed anounyym1 closed 12 years ago

anounyym1 commented 12 years ago

After newhtml update I started get this error on dumper:

Error parsing post 24506058:
------
<div class="postContainer opContainer" id="pc24506058"><div id="p24506058" class="post op"><div class="postInfoM mobile" id="pim24506058"><span class="postNum nameBlock"><span class="subject">Attention 4chan extension/script/archive developers!</span> <span class="name"><span style="color:#F00000">moot</span></span> <span class="postertrip"><span style="color:#FF0000;font-weight:normal">!Ep8pui8Vw2</span></span> <span class="commentpostername"><span style="color:#F00000">## Admin</span></span><br /><em><a href="res/24506058#p24506058" title="Highlight this post">No.</a><a href="res/24506058#q24506058" title="Quote this post">24506058</a></em></span><span class="dateTime" data-utc="1335585262">04/27/12(Fri)23:54</span></div><div class="file" id="f24506058"><div class="fileInfo"><span class="fileText">File: <a href="//images.4chan.org/g/src/1335585262638.jpg" target="_blank">1335585262.jpg</a>-(18 KB, 476x356, <span title="bertstare.jpg">bertstare.jpg</span>)</span></div><a class="fileThumb" href="//images.4chan.org/g/src/1335585262638.jpg" target="_blank"><img src="//0.thumbs.4chan.org/g/thumb/1335585262638s.jpg" alt="18 KB" data-md5="9y3GCEbhpKTcHI8UQXXU+A==" style="height: 188px; width: 251px;" /></a></div><div class="postInfo" id="pi24506058"><input type="checkbox" name="24506058" value="delete" /> <span class="subject">Attention 4chan extension/script/archive developers!</span> <span class="nameBlock"><span class="name"><span style="color:#F00000">moot</span></span> <span class="postertrip"><span style="color:#FF0000;font-weight:normal">!Ep8pui8Vw2</span></span> <span class="commentpostername"><span style="color:#F00000">## Admin</span></span></span> <span class="dateTime" data-utc="1335585262">04/27/12(Fri)23:54</span> <span class="postNum"><a href="res/24506058#p24506058" title="Highlight this post">No.</a><a href="res/24506058#q24506058" title="Quote this post">24506058</a> <img src="//static.4chan.org/image/sticky.gif" alt="Sticky" title="Sticky" /> <img src="//static.4chan.org/image/closed.gif" alt="Closed" title="Closed" /> &nbsp; [<a href="res/24506058" class="replylink">Reply</a>]</span> </div><blockquote class="postMessage" id="m24506058"><div style="padding: 5px;margin-left: .5em;border-color: #faa;border: 2px dashed rgba(255,0,0,.1);border-radius: 2px">Soon we'll roll out an HTML rewrite across all of the imageboards. The design will remain the same&#44; but the underlying HTML/CSS has been rewritten from scratch. It is HTML5/CSS3&#44; and validates with the exception of a few CSS hacks for cross-browser compatibility.<br /><br />We've made these changes with you in mind. Our existing HTML is about /ten years old/&#44; and is a hodgepodge of tables and spans. The new HTML should be much easier to parse&#44; and when benchmarking the official 4chan Chrome extension&#44; we found that it parses approximately 600% faster.<br /><br />Please visit <a href="/htmlnew/" class="quotelink">&gt;&gt;&gt;/htmlnew/</a> to see the changes. We've tried to include every test case for things you'll see in production. Read through the posts to see some of the notes we've made pointing out specific changes.<br /><br />In addition&#44; CORS is now supported on www.4chan.org and sys.4chan.org&#44; with an origin of boards.4chan.org (HTTP/HTTPS supported). And the new code is a responsive design for mobile browsers.<br /><br />The new code will probably be rolled out some time this weekend. If you maintain an extension&#44; userscript&#44; or archiver&#44; please make your updates as soon as possible.<br /><br />Feel free to send feedback/questions to newhtml@4chan.org.</div></blockquote> </div><div class="postLink mobile"><span class="info"></span>
------
 at Board.pm line 247 thread 11
        Board::troubles('Board::Yotsuba=HASH(0x7f33a40295e8)', 'Error parsing post 24506058:
------
<div class="postContainer...') called at Board/Yotsuba.pm line 189 thread 11
        Board::Yotsuba::parse_post('Board::Yotsuba=HASH(0x7f33a40295e8)', '<div class="postContainer opContainer" id="pc24506058"><div i...', 0) called at Board/Yotsuba.pm line 125 thread 11
        Board::Yotsuba::parse_thread('Board::Yotsuba=HASH(0x7f33a40295e8)', '<div class="postContainer opContainer" id="pc24506058"><div i...') called at Board/Yotsuba.pm line 340 thread 11
        Board::Yotsuba::get_page('Board::Yotsuba=HASH(0x7f33a40295e8)', 0, 'Sun, 13 May 2012 20:42:45 GMT') called at Board.pm line 124 thread 11
        Board::__ANON__() called at Board.pm line 129 thread 11
        Board::content('Board::Yotsuba=HASH(0x7f33a40295e8)', 'Board::Request::PAGE=ARRAY(0x7f33a461e798)') called at ./board-dump.pl line 216 thread 11
        main::__ANON__() called at ./board-dump.pl line 302 thread 11
        eval {...} called at ./board-dump.pl line 302 thread 11
Error parsing thread (see failed post above)
------
 at Board.pm line 247 thread 11
        Board::troubles('Board::Yotsuba=HASH(0x7f33a40295e8)', 'Error parsing thread (see failed post above)\x{a}------\x{a}') called at Board/Yotsuba.pm line 127 thread 11
        Board::Yotsuba::parse_thread('Board::Yotsuba=HASH(0x7f33a40295e8)', '<div class="postContainer opContainer" id="pc24506058"><div i...') called at Board/Yotsuba.pm line 340 thread 11
        Board::Yotsuba::get_page('Board::Yotsuba=HASH(0x7f33a40295e8)', 0, 'Sun, 13 May 2012 20:42:45 GMT') called at Board.pm line 124 thread 11
        Board::__ANON__() called at Board.pm line 129 thread 11
        Board::content('Board::Yotsuba=HASH(0x7f33a40295e8)', 'Board::Request::PAGE=ARRAY(0x7f33a461e798)') called at ./board-dump.pl line 216 thread 11
        main::__ANON__() called at ./board-dump.pl line 302 thread 11
        eval {...} called at ./board-dump.pl line 302 thread 11

Otherwise it looks to working.

eksopl commented 12 years ago

Thanks. My theory is that old mod/admin posts have extra HTML cruft in 4chan's database, so they are getting extra cruft generated in their HTML, which doesn't match the way new Mod/Admin posts are meant to be outputted.

Fixed, please report if you notice anything else not archiving correctly.

anounyym1 commented 12 years ago

UPDATE: nevermind, my mistake.

It works now, thanks!

eksopl commented 12 years ago

Looks like capcodes aren't working. Reopening.

eksopl commented 12 years ago

I believe this is fixed. Can you update to latest master, delete post 24840801 and try again?

anounyym1 commented 12 years ago

Now it looks working

eksopl commented 12 years ago

Thank you!