ChrisDelClea / turCy

Apache License 2.0
2 stars 1 forks source link

turCy

An Open Information Extraction System mainly designed for German.

Installation

pip install turcy

python -m spacy download de_core_news_lg-3.0.0 --direct

Can be applied to other languages as well, however some extrawork is necessary as no patterns for english are shipped. Therefore, you would have to build your own patterns first. For building patterns, a `pattern_builder module is available.

How it works

img_3.png

1. Building a Pattern

img_2.png

img_1.png

2. Extraction

  1. Load the German Language Model from spaCy.
  2. Add turCy to the nlp-Pipeline.
  3. Pass the document to the pipeline.
  4. Iterate over the sentences in the document and access the triples in each sentence.
def example():
    nlp = spacy.load("de_core_news_lg", exclude=["ner"])
    nlp.max_length = 2096700
    turcy.add_to_pipe(nlp)  # apply/use current patterns in list
    pipeline_params = {"attach_triple2sentence": {"pattern_list": "small"}}
    doc = nlp("Nürnberg ist eine Stadt in Deutschland.", component_cfg=pipeline_params)
    for sent in doc.sents:
        print(sent)
        for triple in sent._.triples:
            (subj, pred, obj) = triple["triple"]
            print(f"subject:'{subj}', predicate:'{pred}' and object: '{obj}'")

3. Results

img_5.png

img_6.png

References