TRI-ML / vlm-evaluation

VLM Evaluation: Benchmark for VLMs, spanning text generation tasks from VQA to Captioning
Other
89 stars 10 forks source link

The issue of abnormal indicators. #10

Open tayton42 opened 7 months ago

tayton42 commented 7 months ago

Thank you for your work! But I encountered some problems when calculating indicators using the code. Even when using the pretrain you provided, I got different results from those in the paper. I tested using the code from https://huggingface.co/TRI-ML/prismatic-vlms/tree/main/prism-dinosiglip%2B7b and got the following results.

04/26 [10:42:19] INFO     | >> [*] Starting Official Scoring for Dataset `vqa-v2-full` => Model `prism-dinosiglip+7b`           score.py:61
                 INFO     | >> [*] Metrics JSON already exists at                                                               score.py:67
                          `/opt/cv/tianyutong/vlm-evaluation/results/vqa-v2/vqa-v2-full/prism-dinosiglip+7b/metrics.json` =>>              
                          Exiting!                                                                                                         
                 INFO     | >> [*] Results for Model `prism-dinosiglip+7b` on vqa-v2-full (Split = val)                         score.py:92
                                    => Accuracy (Official): 51.260                                                                         
Executing command for dataset.type=gqa-full
04/26 [10:42:23] INFO     | >> [*] Starting Official Scoring for Dataset `gqa-full` => Model `prism-dinosiglip+7b`              score.py:61
                 INFO     | >> [*] Metrics JSON already exists at                                                               score.py:67
                          `/opt/cv/tianyutong/vlm-evaluation/results/gqa/gqa-full/prism-dinosiglip+7b/metrics.json` =>>                    
                          Exiting!                                                                                                         
                 INFO     | >> [*] Results for Model `prism-dinosiglip+7b` on gqa-full (Split = testdev_balanced)               score.py:92
                                    => Accuracy (Official): 37.260                                                                         
Executing command for dataset.type=vizwiz-full
04/26 [10:42:26] INFO     | >> [*] Starting Official Scoring for Dataset `vizwiz-full` => Model `prism-dinosiglip+7b`           score.py:61
                 INFO     | >> [*] Metrics JSON already exists at                                                               score.py:67
                          `/opt/cv/tianyutong/vlm-evaluation/results/vizwiz/vizwiz-full/prism-dinosiglip+7b/metrics.json` =>>              
                          Exiting!                                                                                                         
                 INFO     | >> [*] Results for Model `prism-dinosiglip+7b` on vizwiz-full                                       score.py:92
                          (VizWiz-Overall/VizWiz-Answerable/VizWiz-Unanswerable/VizWiz-Unanswerable-AvgPR/VizWiz-Unanswerable-F            
                          1) (Split = val)                                                                                                 
                                    => VizWiz-Overall  Accuracy (Official): 53.940                                                         
                                    => VizWiz-Answerable  Accuracy (Official): 32.520                                                      
                                    => VizWiz-Unanswerable  Accuracy (Official): 99.320                                                    
                                    => VizWiz-Unanswerable-AvgPR  Accuracy (Official): 33.230                                              
                                    => VizWiz-Unanswerable-F1  Accuracy (Official): 49.790                                                 
Executing command for dataset.type=text-vqa-full
04/26 [10:42:30] INFO     | >> [*] Starting Official Scoring for Dataset `text-vqa-full` => Model `prism-dinosiglip+7b`         score.py:61
                 INFO     | >> [*] Metrics JSON already exists at                                                               score.py:67
                          `/opt/cv/tianyutong/vlm-evaluation/results/text-vqa/text-vqa-full/prism-dinosiglip+7b/metrics.json`              
                          =>> Exiting!                                                                                                     
                 INFO     | >> [*] Results for Model `prism-dinosiglip+7b` on text-vqa-full (TextVQA-OCR/TextVQA-Pure) (Split = score.py:92
                          val)                                                                                                             
                                    => TextVQA-OCR  Accuracy (Official): 0.508                                                             
                                    => TextVQA-Pure  Accuracy (Official): 0.459                                                            
Executing command for dataset.type=vsr-full
04/26 [10:42:34] INFO     | >> [*] Starting Official Scoring for Dataset `vsr-full` => Model `prism-dinosiglip+7b`              score.py:61
                 INFO     | >> [*] Metrics JSON already exists at                                                               score.py:67
                          `/opt/cv/tianyutong/vlm-evaluation/results/vsr/vsr-full/prism-dinosiglip+7b/metrics.json` =>>                    
                          Exiting!                                                                                                         
                 INFO     | >> [*] Results for Model `prism-dinosiglip+7b` on vsr-full (VSR-ExactMatch/VSR-AUCROC/VSR-AUCPR)    score.py:92
                          (Split = zeroshot-test)                                                                                          
                                    => VSR-ExactMatch  Accuracy (Official): 0.542                                                          
                                    => VSR-AUCROC  Accuracy (Official): 0.621                                                              
                                    => VSR-AUCPR  Accuracy (Official): 0.605                                                               
Executing command for dataset.type=refcoco-full
04/26 [10:42:37] INFO     | >> [*] Starting Official Scoring for Dataset `refcoco-full` => Model `prism-dinosiglip+7b`          score.py:61
                 INFO     | >> [*] Metrics JSON already exists at                                                               score.py:67
                          `/opt/cv/tianyutong/vlm-evaluation/results/refcoco/refcoco-full/prism-dinosiglip+7b/metrics.json` =>>            
                          Exiting!                                                                                                         
                 INFO     | >> [*] Results for Model `prism-dinosiglip+7b` on refcoco-full (RefCOCO/RefCOCO+/RefCOCOg) (Split = score.py:92
                          val)                                                                                                             
                                    => RefCOCO  Accuracy (Official): 0.318                                                                 
                                    => RefCOCO+  Accuracy (Official): 0.299                                                                
                                    => RefCOCOg  Accuracy (Official): 0.375                                                                
Executing command for dataset.type=tally-qa-full
04/26 [10:42:41] INFO     | >> [*] Starting Official Scoring for Dataset `tally-qa-full` => Model `prism-dinosiglip+7b`         score.py:61
                 INFO     | >> [*] Metrics JSON already exists at                                                               score.py:67
                          `/opt/cv/tianyutong/vlm-evaluation/results/tally-qa/tally-qa-full/prism-dinosiglip+7b/metrics.json`              
                          =>> Exiting!                                                                                                     
                 INFO     | >> [*] Results for Model `prism-dinosiglip+7b` on tally-qa-full                                     score.py:92
                          (TallyQA-simple-Accuracy/TallyQA-simple-AUCROC/TallyQA-simple-AUCPR/TallyQA-complex-Accuracy/TallyQA-            
                          complex-AUCROC/TallyQA-complex-AUCPR/TallyQA-final-Accuracy/TallyQA-final-AUCROC/TallyQA-final-AUCPR)            
                           (Split = test)                                                                                                  
                                    => TallyQA-simple-Accuracy  Accuracy (Official): 0.697                                                 
                                    => TallyQA-simple-AUCROC  Accuracy (Official): 0.717                                                   
                                    => TallyQA-simple-AUCPR  Accuracy (Official): 0.243                                                    
                                    => TallyQA-complex-Accuracy  Accuracy (Official): 0.425                                                
                                    => TallyQA-complex-AUCROC  Accuracy (Official): 0.631                                                  
                                    => TallyQA-complex-AUCPR  Accuracy (Official): 0.134                                                   
                                    => TallyQA-final-Accuracy  Accuracy (Official): 0.587                                                  
                                    => TallyQA-final-AUCROC  Accuracy (Official): 0.690                                                    
                                    => TallyQA-final-AUCPR  Accuracy (Official): 0.204                                                     
Executing command for dataset.type=pope-full
04/26 [10:42:45] INFO     | >> [*] Starting Official Scoring for Dataset `pope-full` => Model `prism-dinosiglip+7b`             score.py:61
                 INFO     | >> [*] Metrics JSON already exists at                                                               score.py:67
                          `/opt/cv/tianyutong/vlm-evaluation/results/pope/pope-full/prism-dinosiglip+7b/metrics.json` =>>                  
                          Exiting!                                                                                                         
                 INFO     | >> [*] Results for Model `prism-dinosiglip+7b` on pope-full                                         score.py:92
                          (POPE-adversarial-Accuracy/POPE-adversarial-AUCROC/POPE-adversarial-AUCPR/POPE-popular-Accuracy/POPE-            
                          popular-AUCROC/POPE-popular-AUCPR/POPE-random-Accuracy/POPE-random-AUCROC/POPE-random-AUCPR/POPE-fina            
                          l-Accuracy/POPE-final-AUCROC/POPE-final-AUCPR) (Split = eval)                                                    
                                    => POPE-adversarial-Accuracy  Accuracy (Official): 0.757                                               
                                    => POPE-adversarial-AUCROC  Accuracy (Official): 0.917                                                 
                                    => POPE-adversarial-AUCPR  Accuracy (Official): 0.927                                                  
                                    => POPE-popular-Accuracy  Accuracy (Official): 0.703                                                   
                                    => POPE-popular-AUCROC  Accuracy (Official): 0.908                                                     
                                    => POPE-popular-AUCPR  Accuracy (Official): 0.926                                                      
                                    => POPE-random-Accuracy  Accuracy (Official): 0.829                                                    
                                    => POPE-random-AUCROC  Accuracy (Official): 0.946                                                      
                                    => POPE-random-AUCPR  Accuracy (Official): 0.961                                                       
                                    => POPE-final-Accuracy  Accuracy (Official): 0.763                                                     
                                    => POPE-final-AUCROC  Accuracy (Official): 0.923                                                       
                                    => POPE-final-AUCPR  Accuracy (Official): 0.937   

I can't figure out which part of the problem is. Can you provide me with some suggestions? Thank you. Here is the script I used.

MODEL_ID="prism-dinosiglip+7b"
MODEL_DIR="/dir/prism-dinosiglip+7b"

DATASET_TYPES=("vqa-v2-full" "gqa-full" "vizwiz-full" "text-vqa-full" "vsr-full" "refcoco-full" "tally-qa-full" "pope-full") #"ocid-ref-full"

for DATASET_TYPE in "${DATASET_TYPES[@]}"
do
    echo "Executing command for dataset.type=${DATASET_TYPE}"
    CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch --main_process_port 29501 --num_processes=8 scripts/evaluate.py --model_id $MODEL_ID --model_dir $MODEL_DIR --dataset.type $DATASET_TYPE
done
MODEL_ID="prism-dinosiglip+7b"

DATASET_TYPES=("vqa-v2-full" "gqa-full" "vizwiz-full" "text-vqa-full" "vsr-full" "refcoco-full" "tally-qa-full" "pope-full") #"ocid-ref-full"

for DATASET_TYPE in "${DATASET_TYPES[@]}"
do
    echo "Executing command for dataset.type=${DATASET_TYPE}"
    python scripts/score.py --model_id $MODEL_ID --dataset.type $DATASET_TYPE --dataset.root_dir /dir/vlm-evaluation --results_dir /dir/vlm-evaluation/results
done
siddk commented 7 months ago

Hey @tayton42 - I just did a fresh install of the repository, and ran out GQA-Full just to sanity check: this is what I'm getting (as expected):

(vlm-evaluation) ubuntu@ip-10-232-15-253:/mnt/fsx/skaramcheti/code/vlm-evaluation-dev$ accelerate launch --num_processes=8 scripts/evaluate.py --model_id="prism-dinosiglip+7b" --dataset.type="gqa-full"
...
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
04/30 [11:43:21] INFO     | >>     |=> Done Evaluating =>> Exiting!

(vlm-evaluation) ubuntu@ip-10-232-15-253:/mnt/fsx/skaramcheti/code/vlm-evaluation-dev$ python scripts/score.py
04/30 [11:44:44] INFO     | >> [*] Starting Official Scoring for Dataset `gqa-full` => Model `prism-dinosiglip+7b`                                                                                                                                              
                 INFO     | >> [*] Results for Model `prism-dinosiglip+7b` on gqa-full (Split = testdev_balanced) 
                                    => Accuracy (Official): 65.290

There might be a couple of gotchas here with dependency versions and accelerate configuration. Can you do me a favor and dump the output of pip freeze as well as accelerate config?

tayton42 commented 6 months ago

Hey @tayton42 - I just did a fresh install of the repository, and ran out GQA-Full just to sanity check: this is what I'm getting (as expected):

(vlm-evaluation) ubuntu@ip-10-232-15-253:/mnt/fsx/skaramcheti/code/vlm-evaluation-dev$ accelerate launch --num_processes=8 scripts/evaluate.py --model_id="prism-dinosiglip+7b" --dataset.type="gqa-full"
...
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
04/30 [11:43:21] INFO     | >>     |=> Done Evaluating =>> Exiting!

(vlm-evaluation) ubuntu@ip-10-232-15-253:/mnt/fsx/skaramcheti/code/vlm-evaluation-dev$ python scripts/score.py
04/30 [11:44:44] INFO     | >> [*] Starting Official Scoring for Dataset `gqa-full` => Model `prism-dinosiglip+7b`                                                                                                                                              
                 INFO     | >> [*] Results for Model `prism-dinosiglip+7b` on gqa-full (Split = testdev_balanced) 
                                    => Accuracy (Official): 65.290

There might be a couple of gotchas here with dependency versions and accelerate configuration. Can you do me a favor and dump the output of pip freeze as well as accelerate config?

Thank you for your reply.Here is the result of my pip freeze, and I have not configured accelarte yet.It should be the default.

absl-py==2.1.0
accelerate==0.25.0
addict==2.4.0
aiofiles @ file:///xxxxx/vlm-evaluation/aiofiles-23.2.1-py3-none-any.whl#sha256=19297512c647d4b27a2cf7c34caa7e405c0d60b5560618a29a9fe027b18b0107
aiohttp==3.9.5
aiosignal==1.3.1
aliyun-python-sdk-core==2.15.1
aliyun-python-sdk-kms==2.16.2
altair==5.3.0
annotated-types==0.6.0
antlr4-python3-runtime==4.9.3
anyio==4.3.0
appdirs==1.4.4
archspec @ file:///croot/archspec_1697725767277/work
ascii-magic==2.3.0
asttokens @ file:///opt/conda/conda-bld/asttokens_1646925590279/work
astunparse==1.6.3
async-timeout==4.0.3
attrs @ file:///croot/attrs_1695717823297/work
azure-core==1.30.1
azure-identity==1.16.0
azure-storage-blob==12.19.1
azure-storage-file-datalake==12.14.0
backcall @ file:///home/ktietz/src/ci/backcall_1611930011877/work
basicsr==1.4.2
bcrypt==4.1.2
beautifulsoup4 @ file:///croot/beautifulsoup4-split_1681493039619/work
bitsandbytes==0.43.1
bleach==6.1.0
blinker==1.7.0
blis==0.7.11
boltons @ file:///croot/boltons_1677628692245/work
boto3==1.34.86
botocore==1.34.86
braceexpand==0.1.7
Brotli @ file:///tmp/abs_ecyw11_7ze/croots/recipe/brotli-split_1659616059936/work
cachetools==5.3.3
catalogue==2.0.10
certifi @ file:///croot/certifi_1700501669400/work/certifi
cffi @ file:///croot/cffi_1700254295673/work
cfgv==3.4.0
chardet @ file:///home/builder/ci_310/chardet_1640804867535/work
charset-normalizer @ file:///tmp/build/80754af9/charset-normalizer_1630003229654/work
circuitbreaker==1.4.0
click @ file:///croot/click_1698129812380/work
cloudpathlib==0.16.0
colorama==0.4.6
conda @ file:///croot/conda_1696257509808/work
conda-build @ file:///croot/conda-build_1701959075444/work
conda-content-trust @ file:///croot/conda-content-trust_1693490622020/work
conda-libmamba-solver @ file:///croot/conda-libmamba-solver_1691418897561/work/src
conda-package-handling @ file:///croot/conda-package-handling_1690999929514/work
conda_index @ file:///croot/conda-index_1695310357675/work
conda_package_streaming @ file:///croot/conda-package-streaming_1690987966409/work
confection==0.1.4
contexttimer==0.3.3
contourpy==1.2.1
cramjam==2.8.3
crcmod==1.7
cryptography @ file:///croot/cryptography_1702070282333/work
cycler==0.12.1
cymem==2.0.8
decorator @ file:///opt/conda/conda-bld/decorator_1643638310831/work
decord==0.6.0
diffusers==0.16.0
distlib==0.3.8
distro @ file:///croot/distro_1701455004953/work
dnspython==2.4.2
docker-pycreds==0.4.0
draccus==0.7.2
dropout-layer-norm @ file:///home/ubuntu/some_packages/flash-attention-2.1.1/csrc/layer_norm
einops==0.7.0
einops-exts==0.0.4
exceptiongroup @ file:///croot/exceptiongroup_1668714342571/work
executing @ file:///opt/conda/conda-bld/executing_1646925071911/work
expecttest==0.1.6
fairscale==0.4.4
fastapi==0.110.1
ffmpy==0.3.2
filelock @ file:///croot/filelock_1700591183607/work
flash-attn==2.3.3
fonttools==4.50.0
frozenlist==1.4.1
fsspec==2023.12.2
ftfy==6.2.0
future==1.0.0
gitdb==4.0.11
GitPython==3.1.40
gmpy2 @ file:///tmp/build/80754af9/gmpy2_1645455533097/work
google-api-core==2.18.0
google-auth==2.29.0
google-cloud-core==2.4.1
google-cloud-storage==2.10.0
google-crc32c==1.5.0
google-resumable-media==2.7.0
googleapis-common-protos==1.63.0
gradio @ file:///xxxxxxx/vlm-evaluation/gradio-3.35.2-py3-none-any.whl#sha256=6dc42dedaab583198dbbecc605759cbb154f6066f4f0515b26cf976331786c9a
gradio_client @ file:///xxxxxxx/vlm-evaluation/gradio_client-0.2.9-py3-none-any.whl#sha256=9174476e8965b6f622a4426d631c1c29f2209329f110242278fcb6ad26f813d5
grpcio==1.62.2
h11==0.14.0
httpcore==0.17.3
httpx==0.24.0
huggingface-hub==0.17.3
hypothesis==6.92.0
identify==2.5.35
idna @ file:///croot/idna_1666125576474/work
imageio==2.33.1
importlib_metadata==7.1.0
iopath==0.1.10
ipython @ file:///croot/ipython_1694181358621/work
isodate==0.6.1
jedi @ file:///tmp/build/80754af9/jedi_1644315229345/work
Jinja2 @ file:///croot/jinja2_1666908132255/work
jmespath==0.10.0
joblib==1.4.0
jsonlines==4.0.0
jsonpatch @ file:///tmp/build/80754af9/jsonpatch_1615747632069/work
jsonpointer==2.1
jsonschema @ file:///croot/jsonschema_1699041609003/work
jsonschema-specifications @ file:///croot/jsonschema-specifications_1699032386549/work
kaggle==1.6.12
kiwisolver==1.4.5
langcodes==3.3.0
lazy_loader==0.4
libarchive-c @ file:///tmp/build/80754af9/python-libarchive-c_1617780486945/work
libmambapy @ file:///croot/mamba-split_1698782620632/work/libmambapy
linkify-it-py==2.0.3
llava @ git+https://github.com/suraj-nair-tri/LLaVA@6d739ad8793bba97b51051089a295c42b400ff8b
lmdb==1.4.1
lxml==5.2.1
Markdown==3.6
markdown-it-py==2.2.0
markdown2==2.4.13
MarkupSafe @ file:///opt/conda/conda-bld/markupsafe_1654597864307/work
matplotlib==3.8.3
matplotlib-inline @ file:///opt/conda/conda-bld/matplotlib-inline_1662014470464/work
mdit-py-plugins==0.3.3
mdurl==0.1.2
menuinst @ file:///croot/menuinst_1702390294373/work
mergedeep==1.3.4
mkl-fft @ file:///croot/mkl_fft_1695058164594/work
mkl-random @ file:///croot/mkl_random_1695059800811/work
mkl-service==2.4.0
mmcv==1.7.0
mmdet==2.25.2
model-index==0.1.11
more-itertools @ file:///croot/more-itertools_1700662129964/work
mosaicml-streaming==0.7.5
mpmath @ file:///croot/mpmath_1690848262763/work
msal==1.28.0
msal-extensions==1.1.0
multidict==6.0.5
murmurhash==1.0.10
mypy-extensions==1.0.0
networkx @ file:///croot/networkx_1690561992265/work
ninja==1.11.1.1
nodeenv==1.8.0
numpy @ file:///croot/numpy_and_numpy_base_1701295038894/work/dist/numpy-1.26.2-cp310-cp310-linux_x86_64.whl#sha256=2ab675fa590076aa37cc29d18231416c01ea433c0e93be0da3cfd734170cfc6f
oci==2.125.3
omegaconf==2.3.0
openai==1.21.2
opencv-python==4.7.0.72
opencv-python-headless==4.5.5.64
opendatalab==0.0.10
opendatasets==0.1.22
openmim==0.3.9
openxlab==0.0.38
ordered-set==4.1.0
orjson==3.10.1
oss2==2.17.0
packaging @ file:///croot/packaging_1693575174725/work
pandas==2.1.4
paramiko==3.4.0
parso @ file:///opt/conda/conda-bld/parso_1641458642106/work
peft==0.10.0
pexpect @ file:///tmp/build/80754af9/pexpect_1605563209008/work
pickleshare @ file:///tmp/build/80754af9/pickleshare_1606932040724/work
Pillow @ file:///croot/pillow_1696580024257/work
pkginfo @ file:///croot/pkginfo_1679431160147/work
platformdirs @ file:///croot/platformdirs_1692205439124/work
plotly==5.21.0
pluggy @ file:///tmp/build/80754af9/pluggy_1648024709248/work
portalocker==2.8.2
pre-commit==3.7.0
preshed==3.0.9
# Editable install with no version control (prismatic==0.0.1)
-e /opt/cv/tianyutong/prismatic-vlms
prompt-toolkit @ file:///croot/prompt-toolkit_1672387306916/work
proto-plus==1.23.0
protobuf==4.25.1
psutil @ file:///opt/conda/conda-bld/psutil_1656431268089/work
ptyprocess @ file:///tmp/build/80754af9/ptyprocess_1609355006118/work/dist/ptyprocess-0.7.0-py2.py3-none-any.whl
pure-eval @ file:///opt/conda/conda-bld/pure_eval_1646925070566/work
pyarrow==15.0.2
pyasn1==0.6.0
pyasn1_modules==0.4.0
pycocoevalcap==1.2
pycocotools==2.0.7
pycosat @ file:///croot/pycosat_1696536503704/work
pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work
pycryptodome==3.20.0
pydantic==1.10.14
pydantic_core==2.18.1
pydeck==0.8.1b0
pydub==0.25.1
Pygments @ file:///croot/pygments_1684279966437/work
PyJWT==2.8.0
pymongo==4.6.3
PyNaCl==1.5.0
pyOpenSSL @ file:///croot/pyopenssl_1690223430423/work
pyparsing==3.1.2
PySocks @ file:///home/builder/ci_310/pysocks_1640793678128/work
python-dateutil==2.8.2
python-etcd==0.4.5
python-magic==0.4.27
python-multipart==0.0.9
python-slugify==8.0.4
python-snappy==0.7.1
pytz @ file:///croot/pytz_1695131579487/work
PyYAML @ file:///croot/pyyaml_1698096049011/work
pyyaml-include==1.4.1
referencing @ file:///croot/referencing_1699012038513/work
regex==2023.12.25
requests==2.28.2
rich==13.4.2
rpds-py @ file:///croot/rpds-py_1698945930462/work
rsa==4.9
ruamel.yaml @ file:///croot/ruamel.yaml_1666304550667/work
ruamel.yaml.clib @ file:///croot/ruamel.yaml.clib_1666302247304/work
s3transfer==0.10.1
safetensors==0.4.1
salesforce-lavis @ git+https://github.com/siddk/LAVIS@e0ad558d2e4238545df5f6ecc496761a98efb0d1
scikit-image==0.23.1
scikit-learn==1.4.2
scipy==1.13.0
semantic-version==2.10.0
sentencepiece==0.2.0
sentry-sdk==1.39.1
setproctitle==1.3.3
shortuuid==1.0.13
six @ file:///tmp/build/80754af9/six_1644875935023/work
smart-open==6.4.0
smmap==5.0.1
sniffio==1.3.1
sortedcontainers==2.4.0
soupsieve @ file:///croot/soupsieve_1696347547217/work
spacy==3.7.3
spacy-legacy==3.0.12
spacy-loggers==1.0.5
srsly==2.4.8
stack-data @ file:///opt/conda/conda-bld/stack_data_1646927590127/work
starlette==0.37.2
streamlit==1.33.0
svgwrite==1.4.3
sympy @ file:///croot/sympy_1701397643339/work
tabulate==0.9.0
tb-nightly==2.17.0a20240428
tenacity==8.2.3
tensorboard-data-server==0.7.2
terminaltables==3.1.10
text-unidecode==1.3
thinc==8.2.3
threadpoolctl==3.4.0
tifffile==2024.4.18
tiktoken==0.5.2
timm==0.9.16
tokenizers @ file:///xxxxx/prismatic-vlms/tokenizers-0.14.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl#sha256=60fec380778d75cbb492f14ca974f11f37b41d53c057b9c8ba213315b86e1f84
toml==0.10.2
tomli @ file:///opt/conda/conda-bld/tomli_1657175507142/work
toolz @ file:///croot/toolz_1667464077321/work
torch==2.1.2
torchaudio==2.1.2
torchelastic==0.2.2
torchvision==0.16.2
tornado==6.4
tqdm @ file:///croot/tqdm_1679561862951/work
traitlets @ file:///croot/traitlets_1671143879854/work
transformers @ file:///xxxxxx/prismatic-vlms/transformers-4.34.1-py3-none-any.whl#sha256=d06ac09151d7b845e4a4acd6b143a591d946031ee67b4cbb20693b241920ffc0
transformers-stream-generator==0.0.4
triton==2.1.0
truststore @ file:///croot/truststore_1695244293384/work
typer==0.9.4
types-dataclasses==0.6.6
typing-inspect==0.9.0
typing_extensions==4.11.0
tzdata==2023.4
uc-micro-py==1.0.3
urllib3 @ file:///croot/urllib3_1698257533958/work
uvicorn==0.29.0
virtualenv==20.25.3
# Editable install with no version control (vlm_eval==0.0.1)
-e /xxxxxxx/vlm-evaluation
wandb==0.16.6
wasabi==1.1.2
watchdog==4.0.0
wavedrom==2.0.3.post3
wcwidth==0.2.13
weasel==0.3.4
webdataset==0.2.86
webencodings==0.5.1
websockets==12.0
Werkzeug==3.0.2
xxhash==3.4.1
yapf==0.40.2
yarl==1.9.4
zipp==3.18.1
zstandard @ file:///croot/zstandard_1677013143055/work
zstd==1.5.5.1