clics / pyclics

python package implementing the CLICS processing workflow
Apache License 2.0
3 stars 0 forks source link

handling of subgraph attributes #12

Open LinguList opened 6 years ago

LinguList commented 6 years ago

In order to guarantee a balanced subgraph view, a bit of tinkering with the data is required. The algorithm is rather straightforward and small, trying to go for a rather balanced sample (not too many and not too few nodes).

The question is how to represent the subgraph attributes? I figured that it is easiest to add them as a list in gml format.

This means, when loading the graph (which will also contain the infomap clusters): it is easiest to evoke the subgraph as this:

In [8]: list(G.node['1273']['subgraph'])
Out[8]: ['1273', '1931', '2131', '1273', '1273', '630', '221', '1931', '2131']

In [9]: subg = G.subgraph(G.node['1273']['subgraph'])

In [10]: len(subg)
Out[10]: 5

This means, that it is extremely convenient to access a given subgraph from within the api and when simply loading GML. I think I'll add the same as a format to store the infomap attributes, as it seems that this will be useful for the treatment of the data in CLLD as well.