Graphlet-AI / graphlet

PyPi module for Graphlet AI Knowledge Graph Factory
https://graphlet.ai
Apache License 2.0
28 stars 1 forks source link

Create `graphlet.nlp.ie` module for information extraction as part of property graph construction #11

Open rjurney opened 2 years ago

rjurney commented 2 years ago

Use of graphlet.etl Schema Models

We can use graphlet.etl's Pandera Schema Models schema models to define the entities and relations we are extracting.

About graphlet.etl

The module graphlet.etl helps to construct enterprise knowledge graphs as property graphs via Extract, Transform, Load (ETL) / Extract, Load, Transform (ELT) with the assistance of Pandera Schema Models on top of PySpark and Dask. These models are useful in that they define the types of nodes and edges of a heterogeneous information network (HIN) with semi-structured data as properties of nodes and edges in a central place to which other features can refer such as entity resolution.

The classes EntitySchema, NodeSchema and EdgeSchema can be sub-classes to define the types of relations to be extracted.

Use of FlairNLP

FlairNLP is the most commonly used project for Named Entity Recognition and relationships extraction. Flair makes it easy to stack embeddings of different types - for example character and word embeddings as in a flair model.

See the following tutorials:

Features

We need to define the minimum features required to support the integration of these two libraries. Using flair and transfer learning to perform NER and relation extraction makes the tasks primarily a labeling problem. Platforms like snorkel and skweak are helpful for generating labels programmatically.