Many elements/tags appear in wikiextractor's output, such as poem, q, ins, del, br, section, onlyinclude, includeonly, math or mathematical equations (with commands such as \mathbf) not enclosed in any tags.
Download this dump: https://dumps.wikimedia.org/enwiki/20221020/enwiki-20221020-pages-articles1.xml-p1p41242.bz2
Invoke the following command to list lines that contain the opening tags of these elements:
Many elements/tags appear in wikiextractor's output, such as
poem
,q
,ins
,del
,br
,section
,onlyinclude
,includeonly
,math
or mathematical equations (with commands such as\mathbf
) not enclosed in any tags.https://dumps.wikimedia.org/enwiki/20221020/enwiki-20221020-pages-articles1.xml-p1p41242.bz2
wikiextractor --no-templates --html-safe '' -o - dumps.wikimedia.org/enwiki/20221020/enwiki-20221020-pages-articles1.xml-p1p41242.bz2 | grep '<\(poem\|q\|section\|ins\|del\|math\|onlyinclude\|br\|chem\)\b'
Examples from the output:
(Not all of the tags appear in this particular dump.)