Vision encoder output dimension does not match

yuaoze commented 3 weeks ago

Hi, thanks for your excellent work! I'm trying to run bash eval_calvin.sh. When running to FeedbackPolicy/models/policy.py, there is an issue where the shape of the vision_x input to vision_encoder is 192 192, which does not match the model size of 224 224. So I interpolated vision_x to 224 224, and the shape of output by vision_encoder is 8 768, which does not match the dimension of the rearrange operation.vision_x = rearrange(vision_x, "(b T) d h w -> b T (h w) d", b=b, T=T)

retsuh-bqw commented 3 weeks ago

Thanks for your interests in our work! We modify the default input size of VC1-Base model (from 224 to 192) in its corresponding config file. Just a small tweak to the config will let you use our evaluation scripts effectively.

Further updates are welcome if it fails to solve your issue. 😃

yuaoze commented 3 weeks ago

Thanks for your interests in our work! We modify the default input size of VC1-Base model (from 224 to 192) in its corresponding config file. Just a small tweak to the config will let you use our evaluation scripts effectively.

Further updates are welcome if it fails to solve your issue. 😃

Hi, I followed your advice and modified the config file of VC1-Base model, but error still occurred. Here is the details.

yuaoze commented 3 weeks ago

Thanks for your interests in our work! We modify the default input size of VC1-Base model (from 224 to 192) in its corresponding config file. Just a small tweak to the config will let you use our evaluation scripts effectively. Further updates are welcome if it fails to solve your issue. 😃

Hi, I followed your advice and modified the config file of VC1-Base model, but error still occurred. Here is the details.

I solved this issue by specified output_size: 192 under "transform" in config file But output of vision_encoder is shape of 8 * 768, which can not match the dimension of the rearrange operation.vision_x = rearrange(vision_x, "(b T) d h w -> b T (h w) d", b=b, T=T) Can you give me some advice?

retsuh-bqw commented 3 weeks ago

But output of vision_encoder is shape of 8 * 768, which can not match the dimension of the rearrange operation.vision_x = rearrange(vision_x, "(b T) d h w -> b T (h w) d", b=b, T=T) Can you give me some advice?

My bad. You should also set use_cls to False in the config file. Then the encoder will return all feature tokens.

hkz103 commented 2 weeks ago

Hello! I met the same problem. After I set img_size to 192 and use_cls to False, the error still occurred: AssertionError("Input image height (224) doesn't match model (192)."). Can you give me more advice?

retsuh-bqw commented 2 weeks ago

Hello! I met the same problem. After I set img_size to 192 and use_cls to False, the error still occurred: AssertionError("Input image height (224) doesn't match model (192)."). Can you give me more advice?

Is it because the sanity check in the load_model function (line 26 - 29) of VC-1? You may change the function as following:

def load_model(
    model,
    transform,
    metadata=None,
    checkpoint_dict=None,
):
    if checkpoint_dict is not None:
        msg = model.load_state_dict(checkpoint_dict)
        log.warning(msg)

    return model

hkz103 commented 2 weeks ago

Hello! I met the same problem. After I set img_size to 192 and use_cls to False, the error still occurred: AssertionError("Input image height (224) doesn't match model (192)."). Can you give me more advice?

Is it because the sanity check in the load_model function (line 26 - 29) of VC-1? You may change the function as following:
def load_model(
    model,
    transform,
    metadata=None,
    checkpoint_dict=None,
):
    if checkpoint_dict is not None:
        msg = model.load_state_dict(checkpoint_dict)
        log.warning(msg)

    return model

It works! But I met a new problem: bug

retsuh-bqw commented 2 weeks ago

It works! But I met a new problem:

It seems to be an issue within CALVIN. Is your CALVIN env properly installed?

gouyinghong commented 2 weeks ago

Hi, I run bash eval_calvin.sh, but failed to EGL with glad., Do you know how to solve this?

hkz103 commented 2 weeks ago

It works! But I met a new problem:

It seems to be an issue within CALVIN. Is your CALVIN env properly installed?

You are right. I didn't properly install CALVIN. However, the packages uesd in CALVIN and CLOVER seem contradictory. Can you provide a requirements.txt?

retsuh-bqw commented 2 weeks ago

You are right. I didn't properly install CALVIN. However, the packages uesd in CALVIN and CLOVER seem contradictory. Can you provide a requirements.txt?

There is a provided requirements.txt at visual_planner/requirements.txt. What packages conflicts are you getting exactly?

hkz103 commented 1 week ago

You are right. I didn't properly install CALVIN. However, the packages uesd in CALVIN and CLOVER seem contradictory. Can you provide a requirements.txt?

There is a provided requirements.txt at visual_planner/requirements.txt. What packages conflicts are you getting exactly?

Now I met the problem of "Cannot load URDF file" again. And the packages conflicts are listed below. Can you give me more advice? Thanks for your help! problem

retsuh-bqw commented 1 week ago

Now I met the problem of "Cannot load URDF file" again. And the packages conflicts are listed below. Can you give me more advice? Thanks for your help!

You can try to downgrade your networkx to 2.2. I think the other packages are fine.

hkz103 commented 1 week ago

Now I met the problem of "Cannot load URDF file" again. And the packages conflicts are listed below. Can you give me more advice? Thanks for your help!

You can try to downgrade your networkx to 2.2. I think the other packages are fine.

When using networkx2.2，AttributeError"module 'numpy' has no attribute 'int'." is reported, because the high version of numpy no longer uses int and networkx2.2 may use int in numpy.

retsuh-bqw commented 1 week ago

When using networkx2.2，AttributeError"module 'numpy' has no attribute 'int'." is reported, because the high version of numpy no longer uses int and networkx2.2 may use int in numpy.

You may try to downgrade the numpy as well. I'll update relavant information in a new Troubleshooting section.

OpenDriveLab / CLOVER

Vision encoder output dimension does not match #1