benedekrozemberczki / karateclub

Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)
https://karateclub.readthedocs.io
GNU General Public License v3.0
2.17k stars 247 forks source link

How to build my own dataset? #101

Closed smith-co closed 2 years ago

smith-co commented 2 years ago

I have to build graphs, and following that I have to generate graph embedding.

I checked the documentation i.e. https://karateclub.readthedocs.io/.

But I didn't understand how to build my own graphs.

  1. Can you please point out a sample code where you create dataset from scratch?
  2. I have already checked code here. But they all load pre-defined dataset.
  3. Can you show any code snippet where you create graph i.e. create nodes and add edges.
  4. How to set attributes (features) for the nodes and edges?

Thanks in advance for your help.

I am following the https://karateclub.readthedocs.io/en/latest/notes/installation.html.

smith-co commented 2 years ago

I checked https://github.com/benedekrozemberczki/karateclub/tree/master/dataset/node_level/facebook.

I don't see any attributes (features) for the nodes and edges.

What I see is just edge connectivity information.

So can't we set node and edge feature?

smith-co commented 2 years ago

@benedekrozemberczki can you please help me with my query? 🙏

arademaker commented 2 years ago

See the link to the issue in the graph2vec repo.

arademaker commented 2 years ago

Hi @benedekrozemberczki , sorry, you closed this issue without answering the question? I really didn't understand the answer you gave here. In the datasets at https://github.com/benedekrozemberczki/karateclub/tree/master/dataset/node_level, all features.csv contains only numeric values. Finally, the paper https://arxiv.org/abs/1707.05005 that you suggested us to read does not contain a single reference to node attributes.

nashid commented 2 years ago

@benedekrozemberczki I am also waiting for the answer. All the features contain only numeric feature. Can you please help with this query? 🙏

benedekrozemberczki commented 2 years ago

You are referring to node level datasets. But the algorithm in question is graph level. Please go over the code.

arademaker commented 2 years ago

HI @benedekrozemberczki, I am so sorry for insisting. I hope you understand these questions as a sign that your work is being recognized as relevant and valuable. I confess that I may need to read the papers more carefully but, to avoid wasting time, and maybe you can help me with some preliminary intuition so I can confirm your tool applies to my case.

As I said before, I am trying to use your tools to encode graphs that are semantic representations of sentences. I assume you are familiar with AMR (https://amr.isi.edu/language.html). So, does it make sense to use your library to encode AMR graphs? I would expect to have similar embeddings for similar AMR graphs in an embedding space. Does it make sense?

If I got it right, I want graph-level embedding, with the whole graph represented as a vector. The questions above are about how to make node information available to the graph-level embedding. Suppose I have one AMR for each sentence below:

  1. I love dogs.
  2. She loves dogs.

I would expect similar embeddings but note that the AMR graphs would differ. The graph from (1) would have a node with label I (pronoun first person) and the graph for (2) the node she (third person singular). Same for the verb love. In one graph we will have love associated with morphological features such as (present, first person singular) and the second (presented third person singular). So the graph-level embedding would need to consider such morphological information attached to the nodes, right? That would make the embeddings similar but not identical.

nashid commented 2 years ago

@arademaker were you able to figure this out?

arademaker commented 2 years ago

No