Closed Pseudomanifold closed 2 years ago
Simplest fix I'd recommend is setting DATA_DIR
yourself in TOGL/topognn/__init__.py
. You can point that to a directory that you want to use.
As a fix from our side, we could use an env variable or refer to another path. What do you think @edebrouwer, @ExpectationMax, @mi92?
@mohit-kumar-27 any updates on this? Does the proposed workaround solve your problem?
Hello Bastain, Not checked till now, stuck up with some urgent work. Will try running again this weekend and update you possibly on Sunday/Monday
This is how I modified the TOGL/topognn/init.py
import os.path from enum import Enum, auto **DATA_DIR='/home/mohit/TOGL/data/'
class Tasks(Enum): """Valid tasks."""
GRAPH_CLASSIFICATION = auto()
NODE_CLASSIFICATION = auto()
NODE_CLASSIFICATION_WEIGHTED = auto()
Still the code searches in the wrong directory and gives the following error FileNotFoundError: [Errno 2] No such file or directory: '/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/topognn/../data/Benchmark_idx/DD_train.index'
This is the right way; I think you need to install TOGL again afterwards to refresh the file in your virtual environment.
On 22 June 2022 20:10:53 Mohit Kumar @.***> wrote:
This is how I modified the TOGL/topognn/init.py
import os.path from enum import Enum, auto **DATA_DIR='/home/mohit/TOGL/data/'
DATA_DIR = os.path.join(os.path.dirname(file), '..', 'data')**
class Tasks(Enum): """Valid tasks."""
GRAPH_CLASSIFICATION = auto() NODE_CLASSIFICATION = auto() NODE_CLASSIFICATION_WEIGHTED = auto()
Still the code searches in the wrong directory and gives the following error FileNotFoundError: [Errno 2] No such file or directory: '/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/topognn/../data/Benchmark_idx/DD_train.index'
-- Reply to this email directly or view it on GitHub: https://github.com/BorgwardtLab/TOGL/issues/7#issuecomment-1163452505 You are receiving this because you authored the thread.
Message ID: @.***>
Hi Bastain,
I tried running the code by reinstalling the project and DATA_DIR error was resolved, but now I get the following error raise CommError("Permission denied, ask the project owner to grant you access") wandb.errors.CommError: Permission denied, ask the project owner to grant you access wandb: ERROR Internal wandb error: file data was not synced
I created a new wandb account and gave the api key, when the program asked me to, then I got this error
This is the full output
wandb: Currently logged in as: mohitk2 (use wandb login --relogin
to force relogin)
wandb: wandb version 0.12.19 is available! To upgrade, please run:
wandb: $ pip install wandb --upgrade
wandb: ERROR Error while calling W&B API: project not found (<Response [404]>)
Thread SenderThread:
Traceback (most recent call last):
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/lib/retry.py", line 102, in call
result = self._call_fn(*args, kwargs)
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/internal/internal_api.py", line 133, in execute
six.reraise(sys.exc_info())
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/six.py", line 719, in reraise
raise value
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/internal/internal_api.py", line 127, in execute
return self.client.execute(args, kwargs)
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/vendor/gql-0.2.0/gql/client.py", line 52, in execute
result = self._get_result(document, *args, *kwargs)
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/vendor/gql-0.2.0/gql/client.py", line 60, in _get_result
return self.transport.execute(document, args, **kwargs)
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/vendor/gql-0.2.0/gql/transport/requests.py", line 39, in execute
request.raise_for_status()
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/requests/models.py", line 960, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://api.wandb.ai/graphql
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/apis/normalize.py", line 24, in wrapper return func(*args, kwargs) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/internal/internal_api.py", line 922, in upsert_run response = self.gql(mutation, variable_values=variable_values, kwargs) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/lib/retry.py", line 118, in call if not check_retry_fn(e): File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/util.py", line 727, in no_retry_auth raise CommError("Permission denied, ask the project owner to grant you access") wandb.errors.CommError: Permission denied, ask the project owner to grant you access
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/internal/internal_util.py", line 55, in run self._run() File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/internal/internal_util.py", line 105, in _run self._process(record) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/internal/internal.py", line 292, in _process self._sm.send(record) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/internal/sender.py", line 181, in send send_handler(record) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/internal/sender.py", line 604, in send_run self._init_run(run, config_value_dict) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/internal/sender.py", line 626, in _init_run server_run, inserted = self._api.upsert_run( File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/apis/normalize.py", line 62, in wrapper six.reraise(CommError, CommError(message, err), sys.exc_info()[2]) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/six.py", line 718, in reraise raise value.with_traceback(tb) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/apis/normalize.py", line 24, in wrapper return func(*args, kwargs) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/internal/internal_api.py", line 922, in upsert_run response = self.gql(mutation, variable_values=variable_values, kwargs) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/lib/retry.py", line 118, in call if not check_retry_fn(e): File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/util.py", line 727, in no_retry_auth raise CommError("Permission denied, ask the project owner to grant you access") wandb.errors.CommError: Permission denied, ask the project owner to grant you access wandb: ERROR Internal wandb error: file data was not synced Problem at: /scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/pytorch_lightning/loggers/wandb.py 155 experiment Traceback (most recent call last): File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 761, in init run = wi.init() File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/wandbinit.py", line 520, in init backend.cleanup() File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/backend/backend.py", line 167, in cleanup self.interface.join() File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/interface/interface.py", line 836, in join = self._communicate(record) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/interface/interface.py", line 545, in _communicate return self._communicate_async(rec, local=local).get(timeout=timeout) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/interface/interface.py", line 550, in _communicate_async raise Exception("The wandb backend process has shutdown") Exception: The wandb backend process has shutdown wandb: ERROR Abnormal program exit Traceback (most recent call last): File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 761, in init run = wi.init() File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/wandbinit.py", line 520, in init backend.cleanup() File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/backend/backend.py", line 167, in cleanup self.interface.join() File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/interface/interface.py", line 836, in join = self._communicate(record) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/interface/interface.py", line 545, in _communicate return self._communicate_async(rec, local=local).get(timeout=timeout) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/interface/interface.py", line 550, in _communicate_async raise Exception("The wandb backend process has shutdown") Exception: The wandb backend process has shutdown
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "topognn/train_model.py", line 150, in
You can start train_model.py
with WANDB_MODE=disabled
or WANDB_MODE=offline
, i.e.:
$ WANDB_MODE=offline poetry run python train_model.py
@ExpectationMax @edebrouwer: should we solve this more generically and remove the team name from the WandB logger? Or potentially default to a tensorboard logger?
I ran the following from my terminal (togl) mohit@user-Default-string:~/TOGL$ wandb offline
(togl) mohit@user-Default-string:~/TOGL$ python topognn/train_model.py --model TopoGNN --dataset DD --batch_size 20 --lr 0.0007
I get the following error:
Traceback (most recent call last):
File "topognn/train_model.py", line 152, in
self.accelerator.start_training(self)
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 73, in start_training self.training_type_plugin.start_training(trainer) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/ddp_spawn.py", line 107, in start_training mp.spawn(self.new_process, self.mp_spawn_kwargs) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 189, in start_processes process.start() File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/multiprocessing/process.py", line 121, in start self._popen = self._Popen(self) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/multiprocessing/context.py", line 284, in _Popen return Popen(process_obj) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init super().init(process_obj) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init self._launch(process_obj) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 58, in _launch self.pid = util.spawnv_passfds(spawn.get_executable(), File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/multiprocessing/util.py", line 452, in spawnv_passfds return _posixsubprocess.fork_exec( ValueError: bad value(s) in fds_to_keep
wandb: Waiting for W&B process to finish, PID 11662 wandb: Program failed with code 1.**
Could you suggest what needs to be done here?
Seems to be a problem with wandb; please try WANDB_MODE=disabled
.
PS: Please read and follow these instructions for formatting your messages.
I tried (mohit_f) mohit@user-Default-string:~/TOGL$ WANDB_MODE=disabled python topognn/train_model.py --model TopoGNN --dataset DD --batch_size 20 --lr 0.0007 Still getting same error
The issue seems to be with pytorch_lightning and multiprocessing
Hmm, might be better to open a separate issue with pytorch-lightning
. You could also check whether you can change the Trainer
class (use a different strategy for training, as described in the documentation). See also PyTorch issue 538.
Closing this issue for now since the original problem has been resolved. Please feel free to open another issue for anything else related to TOGL.
Hi Bastain! I tried installing without poetry and running your code. Everything worked... I am not able to figure out how to set the DATA_DIR , as the code is looking for the data in the wrong directory. Here is the output that I get
Originally posted by @mohit-kumar-27 in https://github.com/BorgwardtLab/TOGL/issues/6#issuecomment-1157227815