attardi / wikiextractor

A tool for extracting plain text from Wikipedia dumps
GNU Affero General Public License v3.0
3.74k stars 965 forks source link

--json flag is unrecognised #274

Open odebroqueville opened 2 years ago

odebroqueville commented 2 years ago

I'm using wikiextractor 3.0.4.

Screenshot 2021-10-13 at 17 52 26

Btw, is this project still in the works or has it been abandoned?

hyjin-asc commented 2 years ago

@odebroqueville You can modify the code to to Json = True in the Extractor() class at extract.py.

class Extractor():
    """
    An extraction task on a article.
    """
    ##
    # Whether to preserve links in output
    keepLinks = False

    ##
    # Whether to preserve section titles
    keepSections = True

    ##
    # Whether to output text with HTML formatting elements in <doc> files.
    HtmlFormatting = False

    ##
    # Whether to produce json instead of the default <doc> output format.
    toJson = True

    ##
    # Whether to expand templates
    expand_templates = True
...
hxy-62 commented 6 months ago

modify the code to to Json = True in the Extractor() class at extract.py seems like did not work

hxy-62 commented 6 months ago

change the variable name Tojson into to_json = True, then it works