Adding CLIP encoders - Githubissues

apoorvkh commented 2 years ago

Adding visual and text encoders from CLIP for use in RoboTHOR ObjectNav. Can invoke in "zeroshot" mode (where objects are split into seen/unseen sets and their names are encoded with CLIP's text encoder). Or, can just replace the visual encoder with CLIP's ResNet.

Prerequisites

pip install numpy-quaternion
pip install numpy==1.21
pip install ftfy regex tqdm
pip install git+https://github.com/openai/CLIP.git

Training example

PYTHONPATH=. allenact -b projects/objectnav_baselines/experiments/robothor objectnav_robothor_zeroshot_rgb_clipgru_ddppo

lgtm-com[bot] commented 2 years ago

This pull request introduces 4 alerts when merging 1ad4dc883c97683adc99f8ee30c3f781d47f69ea into 9da8674e7781370b4c257eab707a613e953c002f - view on LGTM.com

new alerts:

3 for Unused import
1 for Module is imported more than once

apoorvkh commented 2 years ago

Accidentally created this PR with the wrong branch. See #329 instead.

allenai / allenact

Adding CLIP encoders #328

Prerequisites

Training example