Open ogulcankertmen opened 4 months ago
Hi @ogulcankertmen, I've tried the exact same command on the main
branch on my machine and the training goes well: can you please share more info about the error? Maybe the entire stacktrace?
C:\Users\Oğulcan\AppData\Local\Programs\Python\Python311\Lib\site-packages\sheeprl\utils\logger.py:22: UserWarning: The specified root directory for the TensorBoardLogger is different from the experiment one, so the logger one will be ignored and replaced with the experiment root directory
warnings.warn(
2024-07-11 11:20:41.583052: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-07-11 11:20:43.817109: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
Error executing job with overrides: ['exp=dreamer_v3', 'env=gym', 'env.id=CartPole-v1']
Traceback (most recent call last):
File "C:\Users\Oğulcan\AppData\Local\Programs\Python\Python311\Lib\site-packages\lightning\fabric\loggers\tensorboard.py", line 215, in log_metrics
self.experiment.add_scalar(k, v, step)
^^^^^^^^^^^^^^^
File "C:\Users\Oğulcan\AppData\Local\Programs\Python\Python311\Lib\site-packages\lightning\fabric\loggers\logger.py", line 118, in experiment
return fn(self)
^^^^^^^^
File "C:\Users\Oğulcan\AppData\Local\Programs\Python\Python311\Lib\site-packages\lightning\fabric\loggers\tensorboard.py", line 197, in experiment
self._experiment = SummaryWriter(log_dir=self.log_dir, **self._kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Oğulcan\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\utils\tensorboard\writer.py", line 249, in __init__
self._get_file_writer()
File "C:\Users\Oğulcan\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\utils\tensorboard\writer.py", line 281, in _get_file_writer
self.file_writer = FileWriter(
^^^^^^^^^^^
File "C:\Users\Oğulcan\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\utils\tensorboard\writer.py", line 75, in __init__
self.event_writer = EventFileWriter(
^^^^^^^^^^^^^^^^
File "C:\Users\Oğulcan\AppData\Local\Programs\Python\Python311\Lib\site-packages\tensorboard\summary\writer\event_file_writer.py", line 72, in __init__
tf.io.gfile.makedirs(logdir)
File "C:\Users\Oğulcan\AppData\Local\Programs\Python\Python311\Lib\site-packages\tensorflow\python\lib\io\file_io.py", line 513, in recursive_create_dir_v2
_pywrap_file_io.RecursivelyCreateDir(compat.path_to_bytes(path))
tensorflow.python.framework.errors_impl.FailedPreconditionError: logs\runs\dreamer_v3/CartPole-v1\2024-07-11_11-20-40_dreamer_v3_CartPole-v1_42 is not a directory
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\Oğulcan\AppData\Local\Programs\Python\Python311\Lib\site-packages\sheeprl\cli.py", line 366, in run
run_algorithm(cfg)
File "C:\Users\Oğulcan\AppData\Local\Programs\Python\Python311\Lib\site-packages\sheeprl\cli.py", line 199, in run_algorithm
fabric.launch(reproducible(command), cfg, **kwargs)
File "C:\Users\Oğulcan\AppData\Local\Programs\Python\Python311\Lib\site-packages\lightning\fabric\fabric.py", line 845, in launch
return self._wrap_and_launch(function, self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Oğulcan\AppData\Local\Programs\Python\Python311\Lib\site-packages\lightning\fabric\fabric.py", line 931, in _wrap_and_launch
return to_run(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Oğulcan\AppData\Local\Programs\Python\Python311\Lib\site-packages\lightning\fabric\fabric.py", line 936, in _wrap_with_setup
return to_run(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Oğulcan\AppData\Local\Programs\Python\Python311\Lib\site-packages\sheeprl\cli.py", line 195, in wrapper
return func(fabric, cfg, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Oğulcan\AppData\Local\Programs\Python\Python311\Lib\site-packages\sheeprl\algos\dreamer_v3\dreamer_v3.py", line 379, in main
fabric.logger.log_hyperparams(cfg)
File "C:\Users\Oğulcan\AppData\Local\Programs\Python\Python311\Lib\site-packages\lightning\fabric\utilities\rank_zero.py", line 70, in wrapped_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "C:\Users\Oğulcan\AppData\Local\Programs\Python\Python311\Lib\site-packages\lightning\fabric\loggers\tensorboard.py", line 249, in log_hyperparams
self.log_metrics(metrics, 0)
File "C:\Users\Oğulcan\AppData\Local\Programs\Python\Python311\Lib\site-packages\lightning\fabric\utilities\rank_zero.py", line 70, in wrapped_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "C:\Users\Oğulcan\AppData\Local\Programs\Python\Python311\Lib\site-packages\lightning\fabric\loggers\tensorboard.py", line 218, in log_metrics
raise ValueError(
ValueError:
you tried to log -1 which is currently not supported. Try a dict or a scalar/tensor.
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
There is config information on them.
Hi @ogulcankertmen, I've tried on my windows machine and nothing happens: I'm not able to replicate.
Could you please share also your env?
I've seen from your error that the log_dir
path has mixed separators: I've created a branch where we normalize the separators on windows. Can you try it?
Also: why torch tensorboard is calling tensorflow to create the logdirs?
I'm referring to this line:
File "C:\Users\Oğulcan\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\utils\tensorboard\writer.py", line 75, in __init__
self.event_writer = EventFileWriter(
^^^^^^^^^^^^^^^^
File "C:\Users\Oğulcan\AppData\Local\Programs\Python\Python311\Lib\site-packages\tensorboard\summary\writer\event_file_writer.py", line 72, in __init__
tf.io.gfile.makedirs(logdir)
File "C:\Users\Oğulcan\AppData\Local\Programs\Python\Python311\Lib\site-packages\tensorflow\python\lib\io\file_io.py", line 513, in recursive_create_dir_v2
_pywrap_file_io.RecursivelyCreateDir(compat.path_to_bytes(path))
What happens if you remove tensorflow?
A similar issue: https://github.com/tensorflow/tensorflow/issues/60682#issuecomment-1561899350
I tried; "sheeprl exp=dreamer_v3 env=gym env.id=CartPole-v1" this one and i got "ValueError: you tried to log -1 which is currently not supported. Try a dict or a scalar/tensor.
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace." these results. And I couldn't solve this problem. Do you have any suggestions?