Shen-Lab / GraphCL

[NeurIPS 2020] "Graph Contrastive Learning with Augmentations" by Yuning You, Tianlong Chen, Yongduo Sui, Ting Chen, Zhangyang Wang, Yang Shen
MIT License
548 stars 103 forks source link

Can't load dataset file for semisupervised TU #26

Open Ripper346 opened 3 years ago

Ripper346 commented 3 years ago

Hi, I have the problem of https://github.com/Shen-Lab/GraphCL/issues/4#issuecomment-742254458 and #1 trying lunching semisupervised TU pre training. I launch python main.py --dataset MUTAG --aug1 random2 --aug2 random2 --lr 0.001 --suffix 0 --exp test and I get this error:

[INFO] running single test..
-----
Total 1 experiments in this run:
1/1 - MUTAG - deg+odeg100+ak3+reall - ResGFN
Here we go..
-----
1/1 - MUTAG - deg+odeg100+ak3+reall - ResGFN
None None
Traceback (most recent call last):
  File "main.py", line 338, in <module>     
    run_exp_single_test()
  File "main.py", line 316, in run_exp_single_test
    run_exp_lib([('MUTAG', 'deg+odeg100+ak3+reall', 'ResGFN')])
  File "main.py", line 165, in run_exp_lib
    dataset = get_dataset(
  File "C:\Users\alessandro\Developments\GraphCL\semisupervised_TU\pre-training\datasets.py", line 57, in get_dataset
    dataset = TUDatasetExt(
  File "C:\Users\alessandro\Developments\GraphCL\semisupervised_TU\pre-training\tu_dataset.py", line 49, in __init__
    super(TUDatasetExt, self).__init__(root, name, transform, pre_transform,
  File "C:\_______\envs\torch\lib\site-packages\torch_geometric\datasets\tu_dataset.py", line 66, in __init__
    self.data, self.slices = torch.load(self.processed_paths[0])
  File "C:\_______\envs\torch\lib\site-packages\torch\serialization.py", line 579, in load
    with _open_file_like(f, 'rb') as opened_file:
  File "C:\_______\envs\torch\lib\site-packages\torch\serialization.py", line 230, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "C:\_______\envs\torch\lib\site-packages\torch\serialization.py", line 211, in __init__
    super(_open_file, self).__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'data\\MUTAG\\MUTAG\\processed\\data_deg+odeg100+ak3+reall.pt'

I have installed all, read the other two issues but I can't understand what I have to do in order to make it work (if there is anything I can do). I have all installed in a python env

yyou1996 commented 3 years ago

Hi @Ripper346,

Thanks for your interest and a big apology for your frustration. The following solutions are I can come with:

  1. Would you mind share your env information that I can double check? This experiment is constructed upon an old repo https://github.com/chentingpc/gfn#requirements with slightly outdated packages, so I understand you may install the required ones but in case there is an oversight.

  2. I notice in the error information that FileNotFoundError: [Errno 2] No such file or directory: 'data\\MUTAG\\MUTAG\\processed\\data_deg+odeg100+ak3+reall.pt'. It looks weird for me that the program concat the path as 'data\\MUTAG\\MUTAG\\processed\\data_deg+odeg100+ak3+reall.pt' rather than 'data\MUTAG\MUTAG\processed\data_deg+odeg100+ak3+reall.pt'. Is there anyway for you to debug this?

Ripper346 commented 3 years ago
  1. I have python 3.8.8 and I use it fine with other projects that use torch and pytorch-geometric, but here is my requirements of my env (a bit long)
    alembic==1.5.8
    ase==3.21.1
    astroid==2.5.1
    async-generator==1.10
    attrs==20.3.0
    autopep8==1.5.5
    backcall==0.2.0
    bleach==3.3.0
    certifi==2020.12.5
    chardet==3.0.4
    cliff==3.7.0
    cmaes==0.8.2
    cmd2==1.5.0
    colorama==0.4.4
    colorlog==4.8.0
    control==0.8.4
    cvxopt==1.2.6
    cycler==0.10.0
    Cython==0.29.22
    decorator==4.4.2
    defusedxml==0.7.0
    dgl-cu110==0.6.0
    entrypoints==0.3
    future==0.18.2
    googledrivedownloader==0.4
    grakel==0.1.8
    graphkit-learn==0.2.0.post1
    greenlet==1.0.0
    h5py==3.2.0
    idna==2.10
    ipdb==0.13.5
    ipykernel==5.5.0
    ipython==7.21.0
    ipython-genutils==0.2.0
    isodate==0.6.0
    isort==5.7.0
    jedi==0.18.0
    Jinja2==2.11.3
    joblib==1.0.1
    jsonschema==3.2.0
    jupyter-client==6.1.11
    jupyter-core==4.7.1
    jupyterlab-pygments==0.1.2
    kiwisolver==1.3.1
    lazy-object-proxy==1.5.2
    llvmlite==0.35.0
    Mako==1.1.4
    mariadb==1.0.6
    MarkupSafe==1.1.1
    matplotlib==3.3.4
    mccabe==0.6.1
    mistune==0.8.4
    Mosek==9.2.38
    mysql-connector-python==8.0.23
    nbclient==0.5.3
    nbconvert==6.0.7
    nbformat==5.1.2
    nest-asyncio==1.5.1
    networkx==2.5
    nose==1.3.7
    numba==0.52.0
    numpy==1.20.1
    optuna==2.7.0
    packaging==20.9
    pandas==1.2.3
    pandocfilters==1.4.3
    parso==0.8.1
    pbr==5.5.1
    pickleshare==0.7.5
    Pillow==8.1.1
    prettytable==2.1.0
    prompt-toolkit==3.0.16
    protobuf==3.15.4
    pycodestyle==2.6.0
    Pygments==2.8.0
    pylint==2.7.2
    pyparsing==2.4.7
    pyperclip==1.8.2
    pyreadline3==3.3
    pyrsistent==0.17.3
    python-dateutil==2.8.1
    python-editor==1.0.4
    python-louvain==0.15
    pytz==2021.1
    pywin32==300
    PyYAML==5.4.1
    pyzmq==22.0.3
    rdflib==5.0.0
    requests==2.25.1
    rope==0.18.0
    scikit-learn==0.24.1
    scipy==1.6.1
    seaborn==0.11.1
    six==1.15.0
    SQLAlchemy==1.4.7
    stevedore==3.3.0
    tabulate==0.8.9
    testpath==0.4.4
    threadpoolctl==2.1.0
    toml==0.10.2
    torch==1.8.0+cu111
    torch-cluster==1.5.9
    torch-geometric==1.6.3
    torch-scatter==2.0.6
    torch-sparse==0.6.9
    torch-spline-conv==1.2.1
    torchaudio==0.8.0
    torchvision==0.9.0+cu111
    tornado==6.1
    tqdm==4.58.0
    traitlets==5.0.5
    typing-extensions==3.7.4.3
    urllib3==1.26.3
    wcwidth==0.2.5
    webencodings==0.5.1
    wrapt==1.12.1
  2. I am on windows, it is normal that it places two \\ as escaping the backslash
yyou1996 commented 3 years ago

Thank you. I see your env and it is too new (torch_geometric>=1.6.0 rather than the required 1.4.0) for semi_TU repo (please refer to https://github.com/Shen-Lab/GraphCL/tree/master/semisupervised_TU#option-1 for the correct environment).

Another option is that you can try replacing the __init__ function in tu_dataset by:

    url = 'https://ls11-www.cs.tu-dortmund.de/people/morris/' \ 
             'graphkerneldatasets'

    def __init__(self,
                 root,
                 name,
                 transform=None,
                 pre_transform=None,
                 pre_filter=None,
                 use_node_attr=False,
                 processed_filename='data.pt', aug_ratio=None):
        self.name = name
        self.processed_filename = processed_filename

        self.aug = "none"
        self.aug_ratio = None

        super(TUDatasetExt, self).__init__(root, transform, pre_transform,
                                        pre_filter)
        self.data, self.slices = torch.load(self.processed_paths[0])
        if self.data.x is not None and not use_node_attr:
            self.data.x = self.data.x[:, self.num_node_attributes:]

    @property
    def num_node_labels(self):
        if self.data.x is None:
            return 0
        for i in range(self.data.x.size(1)):
            if self.data.x[:, i:].sum().item() == self.data.x.size(0):
                return self.data.x.size(1) - i
        return 0

    @property
    def num_node_attributes(self):
        if self.data.x is None:
            return 0
        return self.data.x.size(1) - self.num_node_labels

    @property
    def raw_file_names(self):
        names = ['A', 'graph_indicator']
        return ['{}_{}.txt'.format(self.name, name) for name in names]

    @property
    def processed_file_names(self):
        return self.processed_filename

    @property
    def num_node_features(self):
        r"""Returns the number of features per node in the dataset."""
        return self[0][0].num_node_features

which might solve the download issue. A new version of this experiment to adapt to torch_geometric>=1.6.0 will also be released in the following weeks.

Ripper346 commented 3 years ago

Ok, thanks I will try on Monday and I will keep you informed in this issue. I think that that behavior is strange, maybe I could look at differences too between torch geometric 1.4 and 1.6

Ripper346 commented 3 years ago

Hi again, so, your code didn't solve the issue, I mitigated something else resulting the class as the following

class TUDatasetExt(TUDataset):
    def __init__(self,
                 root,
                 name,
                 transform=None,
                 pre_transform=None,
                 pre_filter=None,
                 use_node_attr=False,
                 processed_filename='data.pt',
                 aug="none", aug_ratio=None):
        self.name = name
        self.processed_filename = processed_filename

        self.aug = aug
        self.aug_ratio = None

        super(TUDatasetExt, self).__init__(root, self.name, transform, pre_transform,
                                           pre_filter, use_node_attr)
        self.data, self.slices = torch.load(self.processed_paths[0])
        if self.data.x is not None and not use_node_attr:
            self.data.x = self.data.x[:, self.num_node_attributes:]

    @property
    def processed_file_names(self):
        return self.processed_filename

    @property
    def num_node_features(self):
        r"""Returns the number of features per node in the dataset."""
        return self[0][0].num_node_features

    def download(self):
        super().download()

    def get(self, idx):
        data = self.data.__class__()

        if hasattr(self.data, '__num_nodes__'):
            data.num_nodes = self.data.__num_nodes__[idx]

        for key in self.data.keys:
            item, slices = self.data[key], self.slices[key]
            if torch.is_tensor(item):
                s = list(repeat(slice(None), item.dim()))
                s[self.data.__cat_dim__(key,
                                        item)] = slice(slices[idx],
                                                       slices[idx + 1])
            else:
                s = slice(slices[idx], slices[idx + 1])
            data[key] = item[s]

        if self.aug == 'dropN':
            data = drop_nodes(data, self.aug_ratio)
        elif self.aug == 'wdropN':
            data = weighted_drop_nodes(data, self.aug_ratio, self.npower)
        elif self.aug == 'permE':
            data = permute_edges(data, self.aug_ratio)
        elif self.aug == 'subgraph':
            data = subgraph(data, self.aug_ratio)
        elif self.aug == 'maskN':
            data = mask_nodes(data, self.aug_ratio)
        elif self.aug == 'none':
            data = data
        elif self.aug == 'random4':
            ri = np.random.randint(4)
            if ri == 0:
                data = drop_nodes(data, self.aug_ratio)
            elif ri == 1:
                data = subgraph(data, self.aug_ratio)
            elif ri == 2:
                data = permute_edges(data, self.aug_ratio)
            elif ri == 3:
                data = mask_nodes(data, self.aug_ratio)
            else:
                print('sample augmentation error')
                assert False

        elif self.aug == 'random3':
            ri = np.random.randint(3)
            if ri == 0:
                data = drop_nodes(data, self.aug_ratio)
            elif ri == 1:
                data = subgraph(data, self.aug_ratio)
            elif ri == 2:
                data = permute_edges(data, self.aug_ratio)
            else:
                print('sample augmentation error')
                assert False

        elif self.aug == 'random2':
            ri = np.random.randint(2)
            if ri == 0:
                data = drop_nodes(data, self.aug_ratio)
            elif ri == 1:
                data = subgraph(data, self.aug_ratio)
            else:
                print('sample augmentation error')
                assert False

        else:
            print('augmentation error')
            assert False

        return data

It can now download the dataset but it raises again the error of the issue.

Then I tried to install the conda environment of semisupervised TU first but it can't solve some dependencies:

ResolvePackageNotFound:
  - ld_impl_linux-64=2.33.1
  - libffi=3.3
  - readline=8.0
  - libgcc-ng=9.1.0
  - libstdcxx-ng=9.1.0
  - ncurses=6.2
  - libedit=3.1.20191231

I tried with a docker devcontainer, python 3.7 on debian buster with the requirements:

decorator==4.4.2
future==0.18.2
isodate==0.6.0
joblib==0.16.0
networkx==2.4
numpy==1.19.0
pandas==1.0.5
pillow==7.2.0
plyfile==0.7.2
pyparsing==2.4.7
python-dateutil==2.8.1
pytz==2020.1
rdflib==5.0.0
scikit-learn==0.23.1
scipy==1.5.0
six==1.15.0
threadpoolctl==2.1.0

and then installed manually

pip3 install torch==1.4.0 torch-vision==0.5.0 -f https://download.pytorch.org/whl/torch_stable.html
pip3 install torch-scatter==1.1.0 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
pip3 install torch-sparse==0.4.4 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
pip3 install torch-cluster==1.4.5 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
pip3 install torch-spline-conv==1.1.0 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
pip3 install torch-geometric==1.1.0

The installation and run of the original code went fine. I had just to do an adjustment in train_eval.py from r146 I had to place two checks for the logs and models folders to create them if they don't exist.

I noticed that the issue starts facing from torch-geometric 1.4.2, before it doesn't have that problem.