Many tables (or parts of them) are still in the output.
Steps to reproduce:
Download this dump: https://dumps.wikimedia.org/jawiki/20221020/jawiki-20221020-pages-articles1.xml-p1p114794.bz2
Invoke the following command to list lines that contain the string "colspan":
bzcat jawiki-20221020-pages-articles1.xml-p1p114794.bz2 | wikiextractor/WikiExtractor.py --no-templates -o - - | grep colspan
Many tables (or parts of them) are still in the output.
Steps to reproduce:
https://dumps.wikimedia.org/jawiki/20221020/jawiki-20221020-pages-articles1.xml-p1p114794.bz2
bzcat jawiki-20221020-pages-articles1.xml-p1p114794.bz2 | wikiextractor/WikiExtractor.py --no-templates -o - - | grep colspan
Output:
[shortened]