Closed ThisIsForReview closed 9 months ago
Hi, thanks for the issue.
Unfortunately, your error is not reproducible for me with a minimal example in a fresh environment. Can you post a minimal example that reproduces the error? Your environment setup steps would also help (or at least your package versions for python, torch, and lightning).
Thanks Lars. Here is the code:
import torch.nn.functional as F
from lightning_trainable.trainable import Trainable
from lightning_trainable.modules import FullyConnectedNetwork
from lightning_trainable.metrics import accuracy
from lightning_trainable.trainable import TrainableHParams
from lightning_trainable.modules import FullyConnectedNetworkHParams
import torch
from torch.utils.data import TensorDataset
class MyClassifierHParams(TrainableHParams):
network_hparams: FullyConnectedNetworkHParams
class MyClassifier(Trainable):
# specify your hparams class
hparams: MyClassifierHParams
def __init__(self, hparams, **datasets):
super().__init__(hparams, **datasets)
self.network = FullyConnectedNetwork(self.hparams.network_hparams)
def compute_metrics(self, batch, batch_idx):
# Compute loss and analysis metrics on a batch
x, y = batch
yhat = self.network(x)
cross_entropy = F.cross_entropy(yhat, y)
top1_accuracy = accuracy(yhat, y, k=1)
metrics = {
"loss": cross_entropy,
"cross_entropy": cross_entropy,
"top1_accuracy": top1_accuracy,
}
if self.hparams.network_hparams.output_dims > 5:
# only log top-5 accuracy if it can be computed
metrics["top5_accuracy"] = accuracy(yhat, y, k=5)
return metrics
x = torch.randn(128, 28 * 28)
y = torch.randn(128, 10)
dataset = TensorDataset(x, y)
hparams = MyClassifierHParams(
network_hparams=dict(
input_dims=28 * 28,
output_dims=10,
layer_widths=[1024, 512, 256, 128],
activation="relu",
),
max_epochs=10,
batch_size = 8,
accelerator = 'cpu',
)
model = MyClassifier(hparams, train_data=dataset, val_data=dataset, test_data=dataset)
model.fit()
Versions: Python 3.11 Torch 2.0.0+cpu Lightning 2.1.3 Lightning Utilities: 0.10.0
Running errors:
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
The following callbacks returned in `LightningModule.configure_callbacks` will override existing callbacks passed to Trainer: ModelCheckpoint
| Name | Type | Params
--------------------------------------------------
0 | network | FullyConnectedNetwork | 1.5 M
--------------------------------------------------
1.5 M Trainable params
0 Non-trainable params
1.5 M Total params
5.977 Total estimated model params size (MB)
Traceback (most recent call last):
File "C:\Users\jgao5111\Dropbox (Sydney Uni)\Gaofiles\PythonProjects\NormalizationFlow\FreeForm_Flow\Example2.py", line 54, in <module>
model.fit()
File "C:\Users\my\MyPythonEnv\Torch20TF\Lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\my\MyPythonEnv\Torch20TF\Lib\site-packages\lightning_trainable\trainable\trainable.py", line 285, in fit
trainer.fit(self, **fit_kwargs)
File "C:\Users\my\MyPythonEnv\Torch20TF\Lib\site-packages\lightning\pytorch\trainer\trainer.py", line 544, in fit
call._call_and_handle_interrupt(
File "C:\Users\my\MyPythonEnv\Torch20TF\Lib\site-packages\lightning\pytorch\trainer\call.py", line 44, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\my\MyPythonEnv\Torch20TF\Lib\site-packages\lightning\pytorch\trainer\trainer.py", line 580, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "C:\Users\my\MyPythonEnv\Torch20TF\Lib\site-packages\lightning\pytorch\trainer\trainer.py", line 972, in _run
_log_hyperparams(self)
File "C:\Users\my\MyPythonEnv\Torch20TF\Lib\site-packages\lightning\pytorch\loggers\utilities.py", line 95, in _log_hyperparams
logger.save()
File "C:\Users\my\MyPythonEnv\Torch20TF\Lib\site-packages\lightning_utilities\core\rank_zero.py", line 43, in wrapped_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "C:\Users\my\MyPythonEnv\Torch20TF\Lib\site-packages\lightning\pytorch\loggers\tensorboard.py", line 213, in save
save_hparams_to_yaml(hparams_file, self.hparams)
File "C:\Users\my\MyPythonEnv\Torch20TF\Lib\site-packages\lightning\pytorch\core\saving.py", line 327, in save_hparams_to_yaml
hparams = apply_to_collection(hparams, DictConfig, OmegaConf.to_container, resolve=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\my\MyPythonEnv\Torch20TF\Lib\site-packages\lightning_utilities\core\apply_func.py", line 72, in apply_to_collection
return _apply_to_collection_slow(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\my\MyPythonEnv\Torch20TF\Lib\site-packages\lightning_utilities\core\apply_func.py", line 104, in _apply_to_collection_slow
v = _apply_to_collection_slow(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\my\MyPythonEnv\Torch20TF\Lib\site-packages\lightning_utilities\core\apply_func.py", line 118, in _apply_to_collection_slow
return elem_type(OrderedDict(out))
^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: AttributeDict.__init__() takes 1 positional argument but 2 were given
It seems that the trouble comes from AttributeDict's init() function. See the following example
from lightning_trainable.hparams.attribute_dict import AttributeDict
from collections import OrderedDict, defaultdict
out = [('a','b'),('c',0.01)]
B = OrderedDict(out)
A = AttributeDict(B) # this way is used in lightning_utilities
JB
Thanks for the update. Unfotunately, your example does not produce the error for me (I slightly edited your reply into a version that runs as-is).
However, I think the issue is that the wrong AttributeDict
is being used somewhere. Both lightning
and lightning-trainable
define an AttributeDict
class that fulfills slightly different functionality.
Using lightning
s AttributeDict
in your last snippet works fine, where lightning-trainable
s AttributeDict
produces the error you mention (ours is not intended to be used in this way).
What steps did you take to install lightning-trainable
? Installing from the instructions gives me different package versions:
lightning == 2.1.3
lightning-utilities == 0.10.0
torch == 2.1.2
If all else fails, you can also try to consult the guys over at https://github.com/vislearn/FFF, since that is I assume where you installed lightning-trainable
from.
Thank you Lars. Now I created a new environment and then install requirements.txt from https://github.com/vislearn/FFF. Now it works well. It is amazing, not sure what installation order matters. May need install lightning-trainable before lightning and lightning-utilities? But I may believe this is a bug that you may fix it later on.
By the way, one suggestion to the demo example in Usage section of https://github.com/LarsKue/lightning-trainable. Make it work for both cpu and gpu; set the necessary options max_epochs, batch_size. The current code still throws the following errors through lighniting_utilities data.py
Total length of list
across ranks is zero. Please make sure this was your intention.
Total length of CombinedLoader
across ranks is zero. Please make sure this was your intention.
For a new user, I guess we dont know what are necessary parameter settings.
Thank you again
JB
Good to see you got it working.
I do agree that the README example is not ideal. In the future, we will probably move to automatic doctests, but I currently do not have capacity to work on this. You are always welcome to make a pull request with such changes.
I got this error when running the example code on the page
Is this the issue with lightning_utilities or lightning-trainable