GiulioRossetti / cdlib

Community Discovery Library
http://cdlib.readthedocs.io
BSD 2-Clause "Simplified" License
366 stars 71 forks source link

TypeError when Running Leiden Community Detection on Large Graph #213

Open mjaworski22 opened 2 years ago

mjaworski22 commented 2 years ago

Describe the bug When I fed my graph data into the cdlib.algorithms.leiden() method on a 400k node graph with 600k edges, the algorithm operated correctly and identified communities in the graph. When I did this for a 1million node graph with 1.6 million edges, I get a TypeError.

To Reproduce Steps to reproduce the behavior:

Step 1 Load dataset from csv file into NetworkX graph object using the following function:

def load(csv_path):
    df = pd.read_csv(csv_path)
    Graphtype = nx.Graph()
    G = nx.from_pandas_edgelist(df, 'from_address', 'to_address', edge_attr='value', create_using=Graphtype)

    return(G)

Step 2: Run cdlib.algorithms.leiden() on the NetworkX graph from Step 1 using the following function:

def find_coms_leiden(graph_nx):
    coms = algorithms.leiden(graph_nx)

    return coms

Step 3: Write the communities object to a file using the following function:

def write_coms(coms, out_file):
    readwrite.write_community_csv(coms, out_file, ",")

Step 4: Main

def main():
    Graph = load('./data.csv')
    coms = find_coms_leiden(Graph)
    write_coms(coms, 'coms.csv')
Traceback (most recent call last):
  File "...\main.py", line 90, in <module>        
    main()
  File "...\main.py", line 78, in main
    coms = find_coms_leiden(Graph)
  File "...\main.py", line 33, in find_coms_leiden
    coms = algorithms.leiden(graph_nx)
  File "C:\Anaconda\lib\site-packages\cdlib\algorithms\crisp_partition.py", line 599, in leiden
    g = convert_graph_formats(g_original, ig.Graph)
  File "C:\Anaconda\lib\site-packages\cdlib\utils.py", line 187, in convert_graph_formats
    return __from_nx_to_igraph(graph, directed)
  File "C:\Anaconda\lib\site-packages\cdlib\utils.py", line 122, in __from_nx_to_igraph
    gi.add_edges([(u, v) for (u, v) in g.edges()])
  File "C:\Anaconda\lib\site-packages\igraph\__init__.py", line 376, in add_edges
    res = GraphBase.add_edges(self, es)
TypeError: only non-negative integers, strings or igraph.Vertex objects can be converted to vertex IDs

Expected behavior When I run with data as 400k nodes and 600k edges, the program runs, loads data, calculates communities, and writes them to file properly: See Screenshot 2 in Screenshots Section

Running with 1M nodes and 1.6M edges is expected to output to file the same way (different data obviously).

Screenshots Example of expected result written to file of using input data of 400k nodes and 600k edges: image

Additional Context I use nx.info(my_graph) to check how many edges and nodes are in the input graphs. This was run before cdlib.algorithms.leiden() and it successfully parsed through the data.

github-actions[bot] commented 2 years ago

Thanks for submitting your first issue!

GiulioRossetti commented 2 years ago

Thanks for raising the issue.

Have you tried loading the network with igraph instead of using networkx?

It seems that the error occurs during the graph conversion.