eliorc / node2vec

Implementation of the node2vec algorithm.
MIT License
1.25k stars 250 forks source link

Dimensionality Issue #23

Closed shaksham95 closed 5 years ago

shaksham95 commented 5 years ago

Hey @eliorc ,

I am trying to generate the walks using Node2Vec module(the precomputed probabilities were calculated without any error), but I am facing some dimensionality issue. Can you please help?

Graph: image

Error: image image

eliorc commented 5 years ago

Can you supply code to reproduce this?

shaksham95 commented 5 years ago

Can you supply code to reproduce this?

Here is the code:

raw_G = nx.Graph() # undirected
n = 0

for i in twitter_copy['parsed_doc']:
    for j in twitter_copy['parsed_doc']:
        if i != j:
            if not (raw_G.has_edge(j, i)):
                sim = i.similarity(j)
                raw_G.add_edge(i, j, weight = sim)
                n = n + 1

print(raw_G.number_of_nodes(), "nodes, and", raw_G.number_of_edges(), "edges created.")

from node2vec import Node2Vec
node2vec = Node2Vec(KG, dimensions=20, walk_length=16, num_walks=100, workers=1)

The graph is based on parsed Twitter tweets and is created using networkX. KG represents a part of the graph which is strongly connected.

eliorc commented 5 years ago

Well you are passing KG, which is the most important thing to debug here but I can't see how is it made - what is it?

what is the result of print(type(KG))?

If I were to explore the bug I must first be able to reproduce it

shaksham95 commented 5 years ago

Well you are passing KG, which is the most important thing to debug here but I can't see how is it made - what is it?

what is the result of print(type(KG))?

If I were to explore the bug I must first be able to reproduce it

The piece of code which I have shown earlier is the starting point, it creates the initial graph. The initial graph has isolated nodes which are then removed using:

nx.isolates()

After that, I took the highly connected part of the graph(which is shown in the picture above). That is what 'KG' represents.

print(type(KG)) = networkx.classes.graphviews.SubGraph

Then this KG is what I am passing in the node2vec part.

eliorc commented 5 years ago

It's really hard to tell where the error comes from without the ability to reproduce it on my side, can you pickle this KG so I can load it and try to replicate?

shaksham95 commented 5 years ago

Sure, here is the gpickle for KG: https://drive.google.com/file/d/1TwKQC4iuzAENY0aATYzrNrVdSh6z7Hhq/view?usp=sharing

The file size is around 770 MB(zipped), hence I couldn't upload it here. Please let me know if you can download the pickle.

eliorc commented 5 years ago

Yes I can, I'll try to dubeg this in the weekend probably

eliorc commented 5 years ago

Hey, I've just had some time to debug this, but I can't read the file you sent, using nx.read_gpickle I get the following error

UnpicklingError: NEWOBJ class argument isn't a type object

shaksham95 commented 5 years ago

Hey, I am not sure why you are not able to read it. I tried reading the same pickle using nx.read_gpickle and I was able to read it successfully.

eliorc commented 5 years ago

Can you try setting up a separate virutalenv, with python3.6, install node2vec in it and try to load it? Because that's what I did and I can't

shaksham95 commented 5 years ago

Hey,

So I have been playing around with graphs. One thing which I encountered is if it's a closed graph and then you run node2vec part on it, it runs perfectly fine without any failures. Is it a requirement for the package that the graph needs to be interconnected with no edge being a dead-end (like the one which I shared)?

eliorc commented 5 years ago

Absolutely not, I have tried this on multiple occasions on directed graphs, where some nodes have no neighbors - for instance you can try this yourself


import networkx as nx
from node2vec import Node2Vec

graph = nx.DiGraph()
graph.add_edge(1, 2)
graph.add_edge(1, 3)

# 2 and 3 are nodes without neighbors, dead ends

n2v = Node2Vec(graph)

n2v.fit()

This code works with no errors

shaksham95 commented 5 years ago

Okay, do you think the type of instance can be the reason? In the example 'graph' is an instance of networkx.classes.digraph.DiGraph, but in my case, I had one initial graph and then I took a subpart of that graph which then has the instance of 'networkx.classes.graphviews.SubGraph'

eliorc commented 5 years ago

That's a good question, can you try to cast it? I believe any SubGraph can be represented as a Graph / DiGraph

shaksham95 commented 5 years ago

Nope, that didn't work. I tried converting my subgraph to a graph using nx.to_networkx_graph(). The type of graph did change from networkx.classes.graphviews.SubGraph to networkx.classes.graph.Graph but while running node2vec, it still failed with the same error.

eliorc commented 5 years ago

I really don't know how to debug this, the error points on walk_options, which must be a list of elements that are output by the graph.neighbors(current_node) call. So it must be a list of nodes but the error states that it is not... Can't figure out why that would happen

Can you try another way to save the graph? Or maybe give me code to create a graph that creates a similar problem?