Closed ghost closed 8 years ago
Sorry for not responding! Hopefully you found that you needed the enwiki-20151201-pages-articles-xml.bz2 file instead.
I found the problem lays in bz2 uncompression with xml format (didn’t check the details). It is not the problem for the code in stream2es, and feeding the uncompressed xml file instead of bz2 would get the job done. So I closed the issue.
Anyway, thanks for the reply.
On Dec 16, 2015, at 4:57 PM, Drew Raines notifications@github.com wrote:
Sorry for not responding! Hopefully you found that you needed the enwiki-20151201-pages-articles-xml.bz2 http://burnbit.com/torrent/427846/enwiki_20151201_pages_articles_xml_bz2 file instead.
— Reply to this email directly or view it on GitHub https://github.com/elastic/stream2es/issues/55#issuecomment-165256900.
I ran on the local wikipedia dump enwiki-20151102-pages-articles-multistream.xml.bz2
but got the error message:
And I've tried other, simplewiki-20151102-pages-articles-multistream.xml.bz2, and simplewiki-20150901-pages-articles-multistream.xml.bz2
Same error occurs. Not sure how to fix it.