Pseudomanifold commented 2 years ago

Hi Bastain! I tried installing without poetry and running your code. Everything worked... I am not able to figure out how to set the DATA_DIR , as the code is looking for the data in the wrong directory. Here is the output that I get

(togl) mohit@user-Default-string:~/TOGL$ python topognn/train_model.py --model TopoGNN --dataset DD --batch_size 20 --lr 0.0007
Using backend: pytorch
/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py:68: UserWarning: No correct seed found, seed set to 3526443079
  warnings.warn(*args, **kwargs)
Global seed set to 3526443079
Traceback (most recent call last):
  File "topognn/train_model.py", line 150, in <module>
    main(model_cls, dataset_cls, args)
  File "topognn/train_model.py", line 59, in main
    dataset.prepare_data()
  File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/pytorch_lightning/core/datamodule.py", line 92, in wrapped_fn
    return fn(*args, **kwargs)
  File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py", line 48, in wrapped_fn
    return fn(*args, **kwargs)
  File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/topognn/data_utils.py", line 549, in prepare_data
    with open(os.path.join(DATA_DIR, 'Benchmark_idx', self.name+"_"+section+'.index'), 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/topognn/../data/Benchmark_idx/DD_train.index'

Originally posted by @mohit-kumar-27 in https://github.com/BorgwardtLab/TOGL/issues/6#issuecomment-1157227815

Pseudomanifold commented 2 years ago

Simplest fix I'd recommend is setting DATA_DIR yourself in TOGL/topognn/__init__.py. You can point that to a directory that you want to use.

As a fix from our side, we could use an env variable or refer to another path. What do you think @edebrouwer, @ExpectationMax, @mi92?

Pseudomanifold commented 2 years ago

@mohit-kumar-27 any updates on this? Does the proposed workaround solve your problem?

mohit-kumar-27 commented 2 years ago

Hello Bastain, Not checked till now, stuck up with some urgent work. Will try running again this weekend and update you possibly on Sunday/Monday

mohit-kumar-27 commented 2 years ago

This is how I modified the TOGL/topognn/init.py

import os.path from enum import Enum, auto **DATA_DIR='/home/mohit/TOGL/data/'

DATA_DIR = os.path.join(os.path.dirname(file), '..', 'data')**

class Tasks(Enum): """Valid tasks."""

GRAPH_CLASSIFICATION = auto()
NODE_CLASSIFICATION = auto()
NODE_CLASSIFICATION_WEIGHTED = auto()

Still the code searches in the wrong directory and gives the following error FileNotFoundError: [Errno 2] No such file or directory: '/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/topognn/../data/Benchmark_idx/DD_train.index'

Pseudomanifold commented 2 years ago

This is the right way; I think you need to install TOGL again afterwards to refresh the file in your virtual environment.

On 22 June 2022 20:10:53 Mohit Kumar @.***> wrote:

This is how I modified the TOGL/topognn/init.py

import os.path from enum import Enum, auto **DATA_DIR='/home/mohit/TOGL/data/'

DATA_DIR = os.path.join(os.path.dirname(file), '..', 'data')**

class Tasks(Enum): """Valid tasks."""

GRAPH_CLASSIFICATION = auto() NODE_CLASSIFICATION = auto() NODE_CLASSIFICATION_WEIGHTED = auto()

Still the code searches in the wrong directory and gives the following error FileNotFoundError: [Errno 2] No such file or directory: '/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/topognn/../data/Benchmark_idx/DD_train.index'

-- Reply to this email directly or view it on GitHub: https://github.com/BorgwardtLab/TOGL/issues/7#issuecomment-1163452505 You are receiving this because you authored the thread.

Message ID: @.***>

mohit-kumar-27 commented 2 years ago

Hi Bastain,

I tried running the code by reinstalling the project and DATA_DIR error was resolved, but now I get the following error raise CommError("Permission denied, ask the project owner to grant you access") wandb.errors.CommError: Permission denied, ask the project owner to grant you access wandb: ERROR Internal wandb error: file data was not synced

I created a new wandb account and gave the api key, when the program asked me to, then I got this error

This is the full output

wandb: Currently logged in as: mohitk2 (use wandb login --relogin to force relogin) wandb: wandb version 0.12.19 is available! To upgrade, please run: wandb: $ pip install wandb --upgrade wandb: ERROR Error while calling W&B API: project not found (<Response [404]>) Thread SenderThread: Traceback (most recent call last): File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/lib/retry.py", line 102, in call result = self._call_fn(*args, kwargs) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/internal/internal_api.py", line 133, in execute six.reraise(sys.exc_info()) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/six.py", line 719, in reraise raise value File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/internal/internal_api.py", line 127, in execute return self.client.execute(args, kwargs) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/vendor/gql-0.2.0/gql/client.py", line 52, in execute result = self._get_result(document, *args, *kwargs) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/vendor/gql-0.2.0/gql/client.py", line 60, in _get_result return self.transport.execute(document, args, **kwargs) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/vendor/gql-0.2.0/gql/transport/requests.py", line 39, in execute request.raise_for_status() File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/requests/models.py", line 960, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://api.wandb.ai/graphql

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/apis/normalize.py", line 24, in wrapper return func(*args, kwargs) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/internal/internal_api.py", line 922, in upsert_run response = self.gql(mutation, variable_values=variable_values, kwargs) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/lib/retry.py", line 118, in call if not check_retry_fn(e): File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/util.py", line 727, in no_retry_auth raise CommError("Permission denied, ask the project owner to grant you access") wandb.errors.CommError: Permission denied, ask the project owner to grant you access

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/internal/internal_util.py", line 55, in run self._run() File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/internal/internal_util.py", line 105, in _run self._process(record) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/internal/internal.py", line 292, in _process self._sm.send(record) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/internal/sender.py", line 181, in send send_handler(record) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/internal/sender.py", line 604, in send_run self._init_run(run, config_value_dict) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/internal/sender.py", line 626, in _init_run server_run, inserted = self._api.upsert_run( File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/apis/normalize.py", line 62, in wrapper six.reraise(CommError, CommError(message, err), sys.exc_info()[2]) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/six.py", line 718, in reraise raise value.with_traceback(tb) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/apis/normalize.py", line 24, in wrapper return func(*args, kwargs) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/internal/internal_api.py", line 922, in upsert_run response = self.gql(mutation, variable_values=variable_values, kwargs) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/lib/retry.py", line 118, in call if not check_retry_fn(e): File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/util.py", line 727, in no_retry_auth raise CommError("Permission denied, ask the project owner to grant you access") wandb.errors.CommError: Permission denied, ask the project owner to grant you access wandb: ERROR Internal wandb error: file data was not synced Problem at: /scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/pytorch_lightning/loggers/wandb.py 155 experiment Traceback (most recent call last): File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 761, in init run = wi.init() File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/wandbinit.py", line 520, in init backend.cleanup() File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/backend/backend.py", line 167, in cleanup self.interface.join() File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/interface/interface.py", line 836, in join = self._communicate(record) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/interface/interface.py", line 545, in _communicate return self._communicate_async(rec, local=local).get(timeout=timeout) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/interface/interface.py", line 550, in _communicate_async raise Exception("The wandb backend process has shutdown") Exception: The wandb backend process has shutdown wandb: ERROR Abnormal program exit Traceback (most recent call last): File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 761, in init run = wi.init() File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/wandbinit.py", line 520, in init backend.cleanup() File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/backend/backend.py", line 167, in cleanup self.interface.join() File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/interface/interface.py", line 836, in join = self._communicate(record) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/interface/interface.py", line 545, in _communicate return self._communicate_async(rec, local=local).get(timeout=timeout) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/interface/interface.py", line 550, in _communicate_async raise Exception("The wandb backend process has shutdown") Exception: The wandb backend process has shutdown

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "topognn/train_model.py", line 150, in main(model_cls, dataset_cls, args) File "topognn/train_model.py", line 82, in main dirpath=wandb_logger.experiment.dir, File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/pytorch_lightning/loggers/base.py", line 41, in experiment return get_experiment() or DummyExperiment() File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py", line 48, in wrapped_fn return fn(*args, **kwargs) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/pytorch_lightning/loggers/base.py", line 39, in get_experiment return fn(self) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/pytorch_lightning/loggers/wandb.py", line 155, in experiment self._experiment = wandb.init( File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 798, in init six.raise_from(Exception("problem"), error_seen) File "", line 3, in raise_from Exception: problem

Pseudomanifold commented 2 years ago

You can start train_model.py with WANDB_MODE=disabled or WANDB_MODE=offline, i.e.:

$ WANDB_MODE=offline poetry run python train_model.py

@ExpectationMax @edebrouwer: should we solve this more generically and remove the team name from the WandB logger? Or potentially default to a tensorboard logger?

mohit-kumar-27 commented 2 years ago

I ran the following from my terminal (togl) mohit@user-Default-string:~/TOGL$ wandb offline

(togl) mohit@user-Default-string:~/TOGL$ python topognn/train_model.py --model TopoGNN --dataset DD --batch_size 20 --lr 0.0007

I get the following error:

Traceback (most recent call last): File "topognn/train_model.py", line 152, in main(model_cls, dataset_cls, args) File "topognn/train_model.py", line 98, in main trainer.fit(model, datamodule=dataset) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 499, in fit self.dispatch() File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 546, in dispatch

self.accelerator.start_training(self)

File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 73, in start_training self.training_type_plugin.start_training(trainer) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/ddp_spawn.py", line 107, in start_training mp.spawn(self.new_process, self.mp_spawn_kwargs) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 189, in start_processes process.start() File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/multiprocessing/process.py", line 121, in start self._popen = self._Popen(self) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/multiprocessing/context.py", line 284, in _Popen return Popen(process_obj) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init super().init(process_obj) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init self._launch(process_obj) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 58, in _launch self.pid = util.spawnv_passfds(spawn.get_executable(), File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/multiprocessing/util.py", line 452, in spawnv_passfds return _posixsubprocess.fork_exec( ValueError: bad value(s) in fds_to_keep

wandb: Waiting for W&B process to finish, PID 11662 wandb: Program failed with code 1.**

Could you suggest what needs to be done here?

Pseudomanifold commented 2 years ago

Seems to be a problem with wandb; please try WANDB_MODE=disabled.

PS: Please read and follow these instructions for formatting your messages.

mohit-kumar-27 commented 2 years ago

I tried (mohit_f) mohit@user-Default-string:~/TOGL$ WANDB_MODE=disabled python topognn/train_model.py --model TopoGNN --dataset DD --batch_size 20 --lr 0.0007 Still getting same error

The issue seems to be with pytorch_lightning and multiprocessing

Pseudomanifold commented 2 years ago

Hmm, might be better to open a separate issue with pytorch-lightning. You could also check whether you can change the Trainer class (use a different strategy for training, as described in the documentation). See also PyTorch issue 538.

Closing this issue for now since the original problem has been resolved. Please feel free to open another issue for anything else related to TOGL.

BorgwardtLab / TOGL

DATA_DIR not respected #7

DATA_DIR = os.path.join(os.path.dirname(file), '..', 'data')**

DATA_DIR = os.path.join(os.path.dirname(file), '..', 'data')**