Lab41 / Circulo

Community Detection Research Effort
http://lab41.github.io/Circulo/
Other
79 stars 39 forks source link

Error when testing algorithms on my data #72

Closed lukdo closed 8 years ago

lukdo commented 8 years ago

Hello, I am trying to test the algorithms on a .graphml file I created. I am doing my Master Thesis on community finding, using graph algorithms. At the end I could add my dataset to circulo. But I get this error:

File "/usr/lib/python3.4/multiprocessing/pool.py", line 599, in get raise self._value SystemError: error return without exception set

It works for all the other datasets.

What I did: I put my file in the raw folder from "flights" naming it "flights.graphml". In my run.py file I have:


import os import shutil from circulo.data.databot import *

FILE = "flights.graphml"

class FollowersData(CirculoData):

def __prepare__(self):
    shutil.copyfile(os.path.join(self.raw_data_path, FILE), self.graph_path)

def main(): FollowersData("flights").get_graph()

if name == "main": main()


I took this code from the "southernwoman" data that is also already in a graphml format.

When i execute "python3 run_algos.py flights ALL --output algorithm_results" I get this script in the terminal:


dalys@dalys ~/Documents/circulo/circulo/setup $ python3 run_algos.py followers ALL --output algorithm_results [Graph Generation ETL for followers ] multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/usr/lib/python3.4/multiprocessing/pool.py", line 119, in worker result = (True, func(_args, _kwds)) File "/usr/lib/python3.4/multiprocessing/pool.py", line 44, in mapstar return list(map(_args)) File "run_algos.py", line 168, in data_fetcher databot.get_graph() File "/home/dalys/Documents/circulo/circulo/data/databot.py", line 91, in get_graph return igraph.load(self.graph_path) File "/usr/local/lib/python3.4/dist-packages/igraph/init.py", line 4063, in read return Graph.Read(filename, args, _kwds) File "/usr/local/lib/python3.4/dist-packages/igraph/init.py", line 2223, in Read return reader(f, _args, _kwds) SystemError: error return without exception set """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "run_algos.py", line 304, in main() File "run_algos.py", line 299, in main run(algos, datasets, args.output[0], args.samples, args.workers, args.timeout) File "run_algos.py", line 210, in run r.get() File "/usr/lib/python3.4/multiprocessing/pool.py", line 599, in get raise self._value SystemError: error return without exception set dalys@dalys ~/Documents/circulo/circulo/setup $


Thank you for reading, i hope you can help

lukdo commented 8 years ago

My flights.graphml file looks like this, but longer: And some nodes are double

<?xml version="1.0" ?>
<graphml>
    <key attr.name="label" attr.type="string" id="label"/>
    <graph edgedefault="directed" id="">
        <node id="AiAiyamato1"/>
       <node id="AbiYusuf37"/>
        <node id="Chaima_vmn"/>
        <node id="abu_camil"/>
        <node id="ollspam"/>
        <node id="RusCountering"/>
        <node id="isayful082"/>
        <edge directed="false" source="AbiYusuf37" target="AiAiyamato1"/>
        <edge directed="false" source="Chaima_vmn" target="AbiYusuf37"/>
        <edge directed="false" source="nfb9a7s8771" target="AbiYusuf37"/>
        <edge directed="false" source="isayful082" target="NovostiDamask"/>
    </graph>
</graphml>
ymt123 commented 8 years ago

Have you tried just loading the data into igraph on its own?

import igraph
G=igraph.load('<yourgraphml')

It's probably not the source of your problem but it is probably better to create a new dataset (rather than reusing flights). You should be able to follow the instructions here: https://github.com/Lab41/Circulo/tree/master/circulo/data

In looking through those instructions there is a step missing for a new dataset (I've now added it to the readme). In setup/run_algos.py there is a list called "data_choices" which you have to add your algorithm to

lukdo commented 8 years ago

Good news: I changed the import method to the one you proposed, using the same as "malaria", created my dataset, and it worked. The import works now for other .graphml files. I still have the same problem for my own dataset though.

I have to work on my graphml file because it is not recognized. the 3rd line makes that my file is not recognized. When I delete this line it works.

Do you know why that line is wrong? It is created by default by the pygraphml library.

If you have ways of deleting nodes which ID already exist, and ways of deleting unconnected nodes I would be very interested.

Thank you very much for your help

ymt123 commented 8 years ago

I'm not sure what exactly the key to that line is. It's not the best solution but you could use pygraphml to read the graphml file and construct an iGraph graph from the other graph representation.

As far as pruning it looks like the closest example is from the as_data example. The relevant code lines are:

# Take largest connected component
components = g.components(mode=igraph.WEAK)
if len(components) > 1:
    g = g.subgraph(max(components, key=len))
g.write_graphml(self.graph_path)```
lukdo commented 8 years ago

Finally i changed from "pygraphml" to "NetworkX" to write my graphml file and it gets read from the circulo library without issues.

Thank you for your help ymt123

Problem solved :)

ymt123 commented 8 years ago

Great!