Lightning-AI / pytorch-lightning

Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.
https://lightning.ai
Apache License 2.0
28.32k stars 3.38k forks source link

Support `str(datamodule)` #9947

Open carmocca opened 3 years ago

carmocca commented 3 years ago

🚀 Feature

Add support

print(str(MyDataModule()))

Motivation

It currently prints:

<__main__.MyDataModule object at 0x10284c970>

Pitch

It could print the DataLoader structure:

MyDataModule(
    train_dataloader: {"a": DataLoaderClass(batch_size=8, num_batches=16, num_workers=2), "b":  DataLoaderClass(batch_size=2, num_batches=16, num_workers=2)]
    val_dataloader: [DataLoaderClass(batch_size=3, num_batches=14, num_workers=0), DataLoaderClass(batch_size=8, num_batches=4, num_workers=0)]
    test_dataloader: DataLoaderClass(batch_size=4, num_batches=7, num_workers=2)
)

Or the number of batches per dataloader, similar to what was done in https://github.com/PyTorchLightning/pytorch-lightning/issues/5965

Alternatives

Open to other ideas


If you enjoy Lightning, check out our other projects! ⚡

carmocca commented 3 years ago

cc @kingyiusuen

Abelarm commented 3 years ago

I could take care of this 👍

kingyiusuen commented 3 years ago

cc @kingyiusuen

I am happy to let @Abelarm take it :)

Abelarm commented 3 years ago

Hi guys I am currently at a problem between: Screenshot 2021-10-16 at 20 01 05

and

Screenshot 2021-10-16 at 20 20 09 *

*which is not consistent on the prints.

the problem is the str() of the dict :(

Do you have any idea? or one of the two solutions is good enough?

Abelarm commented 3 years ago

Hi guys I am currently at a problem between: Screenshot 2021-10-16 at 20 01 05

and

Screenshot 2021-10-16 at 20 20 09 *

*which is not consistent on the prints.

the problem is the str() of the dict :(

Do you have any idea? or one of the two solutions is good enough?

if you really want the keys of the dict to be with "" I can do it but it won't be the nicest of the solutions

carmocca commented 3 years ago

Hey @Abelarm! You can open a draft PR so we can check your current implementation and discuss it.

dmarx commented 3 years ago

in the spirit of https://docs.python.org/3.4/reference/datamodel.html#object.__repr__

If at all possible, this should look like a valid Python expression that could be used to recreate an object with the same value (given an appropriate environment).

I recommend:

  1. keeping the quotes around dict keys but not dict values
  2. using an = after the name of initialization parameters instead of a :

Following these recommendations, @Abelarm 's test expression would become:

MyDataModule(
    train_dataloader={"a": DataLoaderClass(batch_size=8, num_batches=16, num_workers=2), "b":  DataLoaderClass(batch_size=2, num_batches=16, num_workers=2)]
    val_dataloader=[DataLoaderClass(batch_size=3, num_batches=14, num_workers=0), DataLoaderClass(batch_size=8, num_batches=4, num_workers=0)]
    test_dataloader=DataLoaderClass(batch_size=4, num_batches=7, num_workers=2)
)
tchaton commented 3 years ago

Hey @carmocca,

I believe adding support for str() provides the same inconvenient as using len().

It might be worth to consider a describe LightningDataModule method instead.

Best, T.C

carmocca commented 3 years ago

The main reason for the revertion of len was the impact to existing truthiness checks. That should not be a problem for str.

@ananthsub do you think the rest of the points you raised in https://github.com/PyTorchLightning/pytorch-lightning/issues/5965#issuecomment-948862064 are worth dropping this feature? We would still have the problem of initialization.

Abelarm commented 2 years ago

in the spirit of https://docs.python.org/3.4/reference/datamodel.html#object.__repr__

If at all possible, this should look like a valid Python expression that could be used to recreate an object with the same value (given an appropriate environment).

I recommend:

  1. keeping the quotes around dict keys but not dict values
  2. using an = after the name of initialization parameters instead of a :

Following these recommendations, @Abelarm 's test expression would become:

MyDataModule(
    train_dataloader={"a": DataLoaderClass(batch_size=8, num_batches=16, num_workers=2), "b":  DataLoaderClass(batch_size=2, num_batches=16, num_workers=2)]
    val_dataloader=[DataLoaderClass(batch_size=3, num_batches=14, num_workers=0), DataLoaderClass(batch_size=8, num_batches=4, num_workers=0)]
    test_dataloader=DataLoaderClass(batch_size=4, num_batches=7, num_workers=2)
)

in my pr I already go : instead of = but I am struggling to add "" around keys dict :(

MrWhatZitToYaa commented 1 month ago

It seems like this feature is still not implemented. Would it be possible to work in this issue?