THUDM / CogDL

CogDL: A Comprehensive Library for Graph Deep Learning (WWW 2023)
https://cogdl.ai
MIT License
1.72k stars 313 forks source link

Learn and save embedding files for customized dataset WITHOUT running evaluation tasks. #241

Closed dingqi closed 3 years ago

dingqi commented 3 years ago

Thanks for open sourcing this wonderful repo. Is it possible to directly learn and save embedding files for customized dataset WITHOUT running evaluation tasks? Can I do this via command line ? Thanks

cenyk1230 commented 3 years ago

Hi @dingqi,

Thanks for your attention to CogDL. We are preparing an easy-to-use API to support this feature.

dingqi commented 3 years ago

Thx for your swift feedback and look forward to it.

cenyk1230 commented 3 years ago

Hi @dingqi,

We provide an API for this requirement. A very basic usage is

import numpy as np
from cogdl import pipeline

# build a pipeline for generating embeddings
# pass model name with its hyper-parameters to this API
generator = pipeline("generate-emb", model="prone")

# generate embedding by an unweighted graph
edge_index = np.array([[0, 1], [0, 2], [0, 3], [1, 2], [2, 3]])
outputs = generator(edge_index)
print(outputs)

You can find the full usage in this link.

dingqi commented 3 years ago

Thx @cenyk1230 for your help. I have the following issue when running this example. I was able to run several examples before updating to the latest version.

~/cogdl-master/examples# python3 generate_emb.py Failed to load C version of sampling, use python version instead. /usr/local/lib/python3.6/dist-packages/numba/core/errors.py:154: UserWarning: Insufficiently recent colorama version found. Numba requires colorama >= 0.3.9 warnings.warn(msg) Traceback (most recent call last): File "generate_emb.py", line 2, in from cogdl import pipeline File "/root/cogdl-master/cogdl/init.py", line 3, in from .experiments import experiment File "/root/cogdl-master/cogdl/experiments.py", line 11, in from cogdl.options import get_default_args File "/root/cogdl-master/cogdl/options.py", line 6, in from cogdl.tasks import TASK_REGISTRY File "/root/cogdl-master/cogdl/tasks/init.py", line 39, in module = importlib.import_module("cogdl.tasks." + task_name) File "/usr/lib/python3.6/importlib/init.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) ModuleNotFoundError: No module named 'cogdl.tasks.'

cenyk1230 commented 3 years ago

Hi @dingqi,

Would you please report the outputs of pip show cogdl? Also, have you ever run the installation script pip install -e .?

dingqi commented 3 years ago

@cenyk1230 Here is the output, I tried both installation via git and locally.

~/cogdl-master# pip3 show cogdl Name: cogdl Version: 0.4.0 Summary: An Extensive Research Toolkit for Deep Learning on Graphs Home-page: https://github.com/THUDM/cogdl Author: None Author-email: None License: MIT Location: /root/cogdl-master Requires: torch, networkx, matplotlib, tqdm, numpy, scipy, gensim, grave, scikit-learn, tabulate, optuna, texttable, ogb, emoji, pre-commit, flake8, numba, transformers, sentencepiece Required-by:

cenyk1230 commented 3 years ago

Hi @dingqi,

I see. Could you please list the results of ls -al cogdl/tasks/*.py?

dingqi commented 3 years ago

Files seem to be there... weird

~/cogdl-master# ls -al cogdl/tasks/*.py -rw-rw-r-- 1 root root 1483 Jun 16 06:34 cogdl/tasks/init.py -rw-rw-r-- 1 root root 5520 Jun 16 06:34 cogdl/tasks/attributed_graph_clustering.py -rw-rw-r-- 1 root root 2501 Jun 16 06:34 cogdl/tasks/base_task.py -rw-rw-r-- 1 root root 8030 Jun 16 06:34 cogdl/tasks/graph_classification.py -rw-rw-r-- 1 root root 3944 Jun 16 06:34 cogdl/tasks/heterogeneous_node_classification.py -rw-rw-r-- 1 root root 27255 Jun 16 06:34 cogdl/tasks/link_prediction.py -rw-rw-r-- 1 root root 3589 Jun 16 06:34 cogdl/tasks/multiplex_link_prediction.py -rw-rw-r-- 1 root root 2320 Jun 16 06:34 cogdl/tasks/multiplex_node_classification.py -rw-rw-r-- 1 root root 5882 Jun 16 06:34 cogdl/tasks/node_classification.py -rw-rw-r-- 1 root root 10450 Jun 16 06:34 cogdl/tasks/oag_supervised_classification.py -rw-rw-r-- 1 root root 10647 Jun 16 06:34 cogdl/tasks/oag_zero_shot_infer.py -rw-rw-r-- 1 root root 706 Jun 16 06:34 cogdl/tasks/pretrain.py -rw-rw-r-- 1 root root 2678 Jun 16 06:34 cogdl/tasks/similarity_search.py -rw-rw-r-- 1 root root 5570 Jun 16 06:34 cogdl/tasks/unsupervised_graph_classification.py -rw-rw-r-- 1 root root 7588 Jun 16 06:34 cogdl/tasks/unsupervised_node_classification.py

cenyk1230 commented 3 years ago

Hi @dingqi,

It's very strange that your init file is named with cogdl/tasks/init.py, which should be cogdl/tasks/__init__.py. Please try to rename the file and re-run the script.

dingqi commented 3 years ago

it is actually init.py it seems the comment editor automatically convert it to bold font.

cenyk1230 commented 3 years ago

Hi @dingqi,

Could you please pull from cenyk1230:fix-tasks and check whether this pull request solves your problem?

dingqi commented 3 years ago

Hi @cenyk1230, it works now! Thank you very much for your help!

dingqi commented 3 years ago

@cenyk1230 thx again for your help. Another issue is when I change the mode to 'gcn', I got the following problem.

python3 ../cogdl-fix-tasks/examples/generate_emb_baseline.py Failed to load C version of sampling, use python version instead. /usr/local/lib/python3.6/dist-packages/numba/core/errors.py:154: UserWarning:

Insufficiently recent colorama version found. Numba requires colorama >= 0.3.9

Traceback (most recent call last): File "../cogdl-fix-tasks/examples/generate_emb_baseline.py", line 19, in generator = pipeline("generate-emb", model="gcn") File "/root/cogdl-fix-tasks/cogdl/pipelines.py", line 185, in pipeline return task_class(app=app, **default_args) File "/root/cogdl-fix-tasks/cogdl/pipelines.py", line 140, in init self.model = build_model(args) File "/root/cogdl-fix-tasks/cogdl/models/init.py", line 52, in build_model return MODEL_REGISTRY[args.model].build_model_from_args(args) File "/root/cogdl-fix-tasks/cogdl/models/nn/gcn.py", line 111, in build_model_from_args args.norm, File "/root/cogdl-fix-tasks/cogdl/models/nn/gcn.py", line 129, in init for i in range(num_layers) File "/root/cogdl-fix-tasks/cogdl/models/nn/gcn.py", line 129, in for i in range(num_layers) File "/root/cogdl-fix-tasks/cogdl/models/nn/gcn.py", line 21, in init self.weight = Parameter(torch.FloatTensor(in_features, out_features)) TypeError: new() received an invalid combination of arguments - got (NoneType, int), but expected one of:

cenyk1230 commented 3 years ago

Sorry @dingqi, this API only supports embedding-based methods, such as deepwalk, netmf, prone. As we know, most GNN models focus on semi-supervised node classification and cannot be directly used for generating embeddings. We are planning to provide a similar API to those GNN models that support unsupervised representation learning.

dingqi commented 3 years ago

Got it, Thx anyway. Look forward to the new feature.

cenyk1230 commented 3 years ago

Hi @dingqi,

We now support self-supervised GNNs for generating embeddings, including four models (dgi/unsup_graphsage/mvgrl/grace). Please have a try!

import numpy as np
from cogdl import pipeline

# generate embedding by an unweighted graph
edge_index = np.array([[0, 1], [0, 2], [0, 3], [1, 2], [2, 3]])

# build a pipeline for generating embeddings using unsupervised GNNs
# pass model name and num_features with its hyper-parameters to this API
generator = pipeline("generate-emb", model="dgi", num_features=8, hidden_size=4)
outputs = generator(edge_index, x=np.random.randn(4, 8))
print(outputs)
dingqi commented 3 years ago

@cenyk1230 Thank you very much. The new features work well.