LarsKue / lightning-trainable

A default trainable module for pytorch lightning.
MIT License
10 stars 1 forks source link

Cannot run lightning-trainable example #24

Closed ThisIsForReview closed 9 months ago

ThisIsForReview commented 9 months ago

I got this error when running the example code on the page

    return elem_type(OrderedDict(out))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: AttributeDict.__init__() takes 1 positional argument but 2 were given

Process finished with exit code 1

Is this the issue with lightning_utilities or lightning-trainable

LarsKue commented 9 months ago

Hi, thanks for the issue.

Unfortunately, your error is not reproducible for me with a minimal example in a fresh environment. Can you post a minimal example that reproduces the error? Your environment setup steps would also help (or at least your package versions for python, torch, and lightning).

ThisIsForReview commented 9 months ago

Thanks Lars. Here is the code:

import torch.nn.functional as F
from lightning_trainable.trainable import Trainable
from lightning_trainable.modules import FullyConnectedNetwork
from lightning_trainable.metrics import accuracy

from lightning_trainable.trainable import TrainableHParams
from lightning_trainable.modules import FullyConnectedNetworkHParams

import torch
from torch.utils.data import TensorDataset

class MyClassifierHParams(TrainableHParams):
    network_hparams: FullyConnectedNetworkHParams

class MyClassifier(Trainable):
    # specify your hparams class
    hparams: MyClassifierHParams

    def __init__(self, hparams, **datasets):
        super().__init__(hparams, **datasets)
        self.network = FullyConnectedNetwork(self.hparams.network_hparams)

    def compute_metrics(self, batch, batch_idx):
        # Compute loss and analysis metrics on a batch
        x, y = batch
        yhat = self.network(x)

        cross_entropy = F.cross_entropy(yhat, y)
        top1_accuracy = accuracy(yhat, y, k=1)

        metrics = {
            "loss": cross_entropy,
            "cross_entropy": cross_entropy,
            "top1_accuracy": top1_accuracy,
        }

        if self.hparams.network_hparams.output_dims > 5:
            # only log top-5 accuracy if it can be computed
            metrics["top5_accuracy"] = accuracy(yhat, y, k=5)

        return metrics

x = torch.randn(128, 28 * 28)
y = torch.randn(128, 10)

dataset = TensorDataset(x, y)

hparams = MyClassifierHParams(
    network_hparams=dict(
        input_dims=28 * 28,
        output_dims=10,
        layer_widths=[1024, 512, 256, 128],
        activation="relu",
    ),
    max_epochs=10,
    batch_size = 8,
    accelerator = 'cpu',
)

model = MyClassifier(hparams, train_data=dataset, val_data=dataset, test_data=dataset)
model.fit()

Versions: Python 3.11 Torch 2.0.0+cpu Lightning 2.1.3 Lightning Utilities: 0.10.0

Running errors:

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
The following callbacks returned in `LightningModule.configure_callbacks` will override existing callbacks passed to Trainer: ModelCheckpoint

  | Name    | Type                  | Params
--------------------------------------------------
0 | network | FullyConnectedNetwork | 1.5 M 
--------------------------------------------------
1.5 M     Trainable params
0         Non-trainable params
1.5 M     Total params
5.977     Total estimated model params size (MB)
Traceback (most recent call last):
  File "C:\Users\jgao5111\Dropbox (Sydney Uni)\Gaofiles\PythonProjects\NormalizationFlow\FreeForm_Flow\Example2.py", line 54, in <module>
    model.fit()
  File "C:\Users\my\MyPythonEnv\Torch20TF\Lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\my\MyPythonEnv\Torch20TF\Lib\site-packages\lightning_trainable\trainable\trainable.py", line 285, in fit
    trainer.fit(self, **fit_kwargs)
  File "C:\Users\my\MyPythonEnv\Torch20TF\Lib\site-packages\lightning\pytorch\trainer\trainer.py", line 544, in fit
    call._call_and_handle_interrupt(
  File "C:\Users\my\MyPythonEnv\Torch20TF\Lib\site-packages\lightning\pytorch\trainer\call.py", line 44, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\my\MyPythonEnv\Torch20TF\Lib\site-packages\lightning\pytorch\trainer\trainer.py", line 580, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "C:\Users\my\MyPythonEnv\Torch20TF\Lib\site-packages\lightning\pytorch\trainer\trainer.py", line 972, in _run
    _log_hyperparams(self)
  File "C:\Users\my\MyPythonEnv\Torch20TF\Lib\site-packages\lightning\pytorch\loggers\utilities.py", line 95, in _log_hyperparams
    logger.save()
  File "C:\Users\my\MyPythonEnv\Torch20TF\Lib\site-packages\lightning_utilities\core\rank_zero.py", line 43, in wrapped_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\my\MyPythonEnv\Torch20TF\Lib\site-packages\lightning\pytorch\loggers\tensorboard.py", line 213, in save
    save_hparams_to_yaml(hparams_file, self.hparams)
  File "C:\Users\my\MyPythonEnv\Torch20TF\Lib\site-packages\lightning\pytorch\core\saving.py", line 327, in save_hparams_to_yaml
    hparams = apply_to_collection(hparams, DictConfig, OmegaConf.to_container, resolve=True)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\my\MyPythonEnv\Torch20TF\Lib\site-packages\lightning_utilities\core\apply_func.py", line 72, in apply_to_collection
    return _apply_to_collection_slow(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\my\MyPythonEnv\Torch20TF\Lib\site-packages\lightning_utilities\core\apply_func.py", line 104, in _apply_to_collection_slow
    v = _apply_to_collection_slow(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\my\MyPythonEnv\Torch20TF\Lib\site-packages\lightning_utilities\core\apply_func.py", line 118, in _apply_to_collection_slow
    return elem_type(OrderedDict(out))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: AttributeDict.__init__() takes 1 positional argument but 2 were given
ThisIsForReview commented 9 months ago

It seems that the trouble comes from AttributeDict's init() function. See the following example

from lightning_trainable.hparams.attribute_dict import AttributeDict
from collections import OrderedDict, defaultdict
out = [('a','b'),('c',0.01)]
B = OrderedDict(out)
A = AttributeDict(B)       # this way is used in lightning_utilities

JB

LarsKue commented 9 months ago

Thanks for the update. Unfotunately, your example does not produce the error for me (I slightly edited your reply into a version that runs as-is).

However, I think the issue is that the wrong AttributeDict is being used somewhere. Both lightning and lightning-trainable define an AttributeDict class that fulfills slightly different functionality.

Using lightnings AttributeDict in your last snippet works fine, where lightning-trainables AttributeDict produces the error you mention (ours is not intended to be used in this way).

What steps did you take to install lightning-trainable? Installing from the instructions gives me different package versions:

lightning == 2.1.3
lightning-utilities == 0.10.0
torch == 2.1.2

If all else fails, you can also try to consult the guys over at https://github.com/vislearn/FFF, since that is I assume where you installed lightning-trainable from.

ThisIsForReview commented 9 months ago

Thank you Lars. Now I created a new environment and then install requirements.txt from https://github.com/vislearn/FFF. Now it works well. It is amazing, not sure what installation order matters. May need install lightning-trainable before lightning and lightning-utilities? But I may believe this is a bug that you may fix it later on.

By the way, one suggestion to the demo example in Usage section of https://github.com/LarsKue/lightning-trainable. Make it work for both cpu and gpu; set the necessary options max_epochs, batch_size. The current code still throws the following errors through lighniting_utilities data.py

Total length of list across ranks is zero. Please make sure this was your intention. Total length of CombinedLoader across ranks is zero. Please make sure this was your intention.

For a new user, I guess we dont know what are necessary parameter settings.

Thank you again

JB

LarsKue commented 9 months ago

Good to see you got it working.

I do agree that the README example is not ideal. In the future, we will probably move to automatic doctests, but I currently do not have capacity to work on this. You are always welcome to make a pull request with such changes.