applied-ai-lab / genesis

Official PyTorch implementation of GENESIS and GENESIS-V2
GNU General Public License v3.0
97 stars 19 forks source link

Visualise reconstruction error #9

Closed tsly123 closed 3 years ago

tsly123 commented 3 years ago

Hi,

Thank you for sharing your works. I downloaded the code and I trained with objects room or used the pre-trained weights. It shows the same error as below:

data_config: datasets/multi_object_config.py
model_config: models/genesisv2_config.py
model_dir: checkpoints/objects_room/1_ori
model_file: model.ckpt-latest
num_images: 10

Restoring flags from checkpoints/objects_room/1_ori/flags.json
Loading 'multi_object_config' from datasets/multi_object_config.py
Using 4 data workers.
Dataset has 1000000 frames
Splitting into 980000/10000/10000 for tng/val/tst
Loading 'genesisv2_config' from models/genesisv2_config.py
Traceback (most recent call last):
  File "scripts/visualise_reconstruction.py", line 135, in <module>
    main()
  File "scripts/visualise_reconstruction.py", line 67, in main
    model = fet.load(config.model_config, pretrained_flags)
  File "/project/hnguyen/stly/braincell/code/genesis/forge/forge/experiment_tools.py", line 258, in load
    return load_func(*args, **kwargs)
  File "models/genesisv2_config.py", line 46, in load
    return GenesisV2(cfg)
  File "models/genesisv2_config.py", line 77, in __init__
    semiconv=cfg.semiconv)
  File "/project/hnguyen/stly/braincell/code/genesis/modules/attention.py", line 158, in __init__
    self.colour_head = B.SemiConv(feat_dim, self.colour_dim, img_size)
  File "/project/hnguyen/stly/braincell/code/genesis/modules/blocks.py", line 172, in __init__
    coords = pixel_coords(img_size)
  File "/project/hnguyen/stly/braincell/code/genesis/modules/blocks.py", line 43, in pixel_coords
    g_1, g_2 = torch.meshgrid(torch.linspace(-1, 1, img_size),
RuntimeError: Trying to create tensor with negative dimension -1: [-1]

I tried to debug it but ran into other errors. Could you take a look at it? Thanks

martinengelcke commented 3 years ago

Thanks for reporting this @tsly123 .

A quick fix should be to set pretrained_flags.K_steps = 7 before line 67. I will work on a more general fix.

The issue is caused by datasets/multi_object_config.py which wraps around multiple datasets. The default K_steps is -1 and it is updated when one of these datasets is created, but it somewhat breaks the intended interface of the config flags which causes the problem here. I expect the same problem to show up in scripts/visualise_generation.py as well.

martinengelcke commented 3 years ago

I pushed a fix in https://github.com/applied-ai-lab/genesis/commit/82eb91de18b4eb50b32a3cf99c09dee0803e327c.

It is not the most elegant solution though. It would be better to disentangle multi_object_config.py into individual data config files to avoid this special case. I have logged this as a desirable enhancement in https://github.com/applied-ai-lab/genesis/issues/10.

I am closing this issue, but feel free to re-open if you are still encountering issues @tsly123 .

tsly123 commented 3 years ago

Thank you for the fix.

However, I am having another error after running the fix. I even set up new environment, downloaded the new code and the errors keep showing. It looks like a tensorflow version error.

#################################################
Restoring flags from checkpoints/objects_room/1_ori/flags.json
Traceback (most recent call last):
  File "scripts/visualise_reconstruction.py", line 138, in <module>
    main()
  File "scripts/visualise_reconstruction.py", line 64, in main
    _, _, test_loader = fet.load(config.data_config, config)
  File "/project/hnguyen/stly/braincell/code/genesis/forge/forge/experiment_tools.py", line 249, in load
    module, name = _import_module(conf_path)
  File "/project/hnguyen/stly/braincell/code/genesis/forge/forge/experiment_tools.py", line 275, in _import_module
    module = imp.load_source(module_path_or_name, file_name)
  File "/home/stly/anaconda3/envs/genesis/lib/python3.7/imp.py", line 171, in load_source
    module = _load(spec)
  File "<frozen importlib._bootstrap>", line 696, in _load
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "datasets/multi_object_config.py", line 28, in <module>
    import third_party.multi_object_datasets.clevr_with_masks as clevr_with_masks
  File "/project/hnguyen/stly/braincell/code/genesis_new/third_party/multi_object_datasets/clevr_with_masks.py", line 31, in <module>
    'image': tf.FixedLenFeature(IMAGE_SIZE+[3], tf.string),
AttributeError: module 'tensorflow' has no attribute 'FixedLenFeature'
#################################################

I fixed the error by changing: 'image': tf.FixedLenFeature(IMAGE_SIZE+[3], tf.string), to 'image': tf.io.FixedLenFeature(IMAGE_SIZE+[3], tf.string), according to this stackoverflow, and it worked but running into another tensorflow version error instead, e.g.

Traceback (most recent call last):
  File "scripts/visualise_reconstruction.py", line 136, in <module>
    main()
  File "scripts/visualise_reconstruction.py", line 63, in main
    _, _, test_loader = fet.load(config.data_config, config)
  File "/project/hnguyen/stly/braincell/code/genesis/forge/forge/experiment_tools.py", line 258, in load
    return load_func(*args, **kwargs)
  File "datasets/multi_object_config.py", line 64, in load
    sess = tf.InteractiveSession()
AttributeError: module 'tensorflow' has no attribute 'InteractiveSession'

The tf.io.FixedLenFeature is for TF 2.x, but the code uses TF 1.14.0. This second error can also be fixed by TF 2.x

Could you take a look at this? Thank you.

martinengelcke commented 3 years ago

It sounds like your are using TF2 which is not supported by this repository. I recommend you downgrade to TF 1.14.0 as specified in the environment.yml. Let me know if you are still having issues after downgrading.

tsly123 commented 3 years ago

I set up new environment by using conda env create -f environment.yml and checked with conda list tensorflow which shows TF 1.14.0. I set up new environment again but the error keep showing.

martinengelcke commented 3 years ago

Hmm. Looking at the tf1.14 documentation, tf.FixedLenFeature should be a valid alias of tf.io.FixedLenFeature and tf.InteractiveSession should also exist.

As a sanity check, what is the output when running "import tensorflow as tf; print(tf.__version__)" in a python shell in your env?

martinengelcke commented 3 years ago

I have closed this issue due to inactivity; feel free to re-open @tsly123 .