Open HeikoPaulheim opened 2 years ago
Hi Heiko :wave:
Are you on the most recent version of pyRDF2Vec? I seem to be unable to reproduce the error. However, I see that you only extract 500 (randomly sampled) walks, so it might be that this bug only occurs very sporadically. Could you perhaps add some np.random.seed() on top of your script to make it fully reproducible?
It seems that at some point, a list (instead of a Vertex object) is appended to a walk. The walk thus probably looks somewhat like ([obj], v1, v2, v3, v3, ...), on which it crashes (the double edged sword of Python...). I cannot find it immediately however.
A fix for this, that would also avoid these kind of problems in the future, would be to create a dedicated Walk
class instead of working with these tuples which we extend. This Walk
class could then have a add_hop
method or something similar on which we can do input checking.
class Walk:
...
def add_hop(self, vertex):
if not isinstance(vertex, Vertex):
raise Exception
Btw, for smaller datasets, setting the number of walks to None
will exhaustively extract all walks, which tends to get better results. (i.e. CommunityWalker(4, None)
)
Hil Gilles,
I'm using the latest version that I get through pip install (0.2.3), but I haven't rebuilt a more recent version on my own.
Actually, it seems to work with the None
option (and yes, it's a fairly small graph), but still, it's a bit of a strange behavior.
Thanks for your help! 😃
It's definitely strange behaviour and it is a bug (so leaving this issue open until fixed). It unfortunately seems to only happen very sporadically and only in the DFS (not the BFS), so it will be a very fun bug to solve ;)
Thanks for reporting this btw!
🐛 Bug
CommunityWalker fails with an exception
Current Behavior
This is the message I get:
Steps to Reproduce
Minimal code snippet:
This example works with all other walk types (HALK, NGram, etc.), but not community walks.
Environment