GuillaumeDD / gowpy

A very simple library for exploiting graph-of-words in NLP
BSD 3-Clause "New" or "Revised" License
12 stars 2 forks source link

GRAPH-OF-WORDS file with edges and nodes labels #3

Open Matt-81 opened 1 year ago

Matt-81 commented 1 year ago

Dear @GuillaumeDD, thank you for this great work! I was trying gowpy.gow.miner and gowpy.gow.io for converting a corpus into a collection of graphs of words. I saw that in the exported file the graphs does not report the input text (e.g., a node like "foo", becomes "v 0 0").

I was wondering if it is possible to export it as "v 0 foo".

Thanks in advance for your help!

GuillaumeDD commented 1 year ago

Hi @Matt-81 , Thank you for your positive feedbacks on gowpy :pray:

Unfortunately, it is not possible to export a node as "v 0 foo". The reason is that frequent mining subgraph algorithms expect node/edge labels as non-negative integers, see https://github.com/Jokeren/gBolt#input-specification for instance.

However, the GoWMiner class keeps the mapping between these integers and their corresponding labels. The easiest way to get back to the tokens is via GoWVectorizer initialized from the GoWMiner. There is an example to get back the feature names in the following notebook examples/classification-r8-frequent_subgraphs.ipynb in Section "GoW Vectorizer Example".

Hope this help!

Matt-81 commented 1 year ago

Hi @GuillaumeDD, great, thanks for the feedback! 🙏