Closed felixbinder closed 5 months ago
Though the console doesn't show any quota that we're coming close to exceeding?
This is on branch sweep_cfgs
Here's another gemini api crash:
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] [2024-06-08 17:37:46,719][__main__][INFO] - Processing 100 rows
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] Error executing job with overrides: ['study_name=june_3_half_heldout_sweep', 'language_model=gemin
i-1.0-pro-002', 'task=dear_abbie', 'task.set=val', 'prompt=object_level/minimal', 'limit=100']
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] Traceback (most recent call last):
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] File "/home/felix/introspection_self_prediction_astra/evals/run_object_level.py", line 212, in m
ain
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] asyncio.run(async_main(cfg))
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] File "/usr/lib/python3.11/asyncio/runners.py", line 190, in run
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] return runner.run(main)
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] ^^^^^^^^^^^^^^^^
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] File "/usr/lib/python3.11/asyncio/runners.py", line 118, in run
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] return self._loop.run_until_complete(task)
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] File "/usr/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] return future.result()
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] ^^^^^^^^^^^^^^^
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] File "/home/felix/introspection_self_prediction_astra/evals/run_object_level.py", line 198, in a
sync_main
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] complete = await async_function_with_retry(
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] File "/home/felix/introspection_self_prediction_astra/.venv/lib/python3.11/site-packages/tenacit
y/_asyncio.py", line 88, in async_wrapped
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] return await fn(*args, **kwargs)
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] ^^^^^^^^^^^^^^^^^^^^^^^^^
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] File "/home/felix/introspection_self_prediction_astra/.venv/lib/python3.11/site-packages/tenacit
y/_asyncio.py", line 47, in __call__
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] do = self.iter(retry_state=retry_state)
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] File "/home/felix/introspection_self_prediction_astra/.venv/lib/python3.11/site-packages/tenacit
y/__init__.py", line 314, in iter
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] return fut.result()
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] ^^^^^^^^^^^^
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] File "/usr/lib/python3.11/concurrent/futures/_base.py", line 449, in result
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] return self.__get_result()
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] ^^^^^^^^^^^^^^^^^^^
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] File "/usr/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] raise self._exception
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] File "/home/felix/introspection_self_prediction_astra/.venv/lib/python3.11/site-packages/tenacit
y/_asyncio.py", line 50, in __call__
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] result = await fn(*args, **kwargs)
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] ^^^^^^^^^^^^^^^^^^^^^^^^^
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] File "/home/felix/introspection_self_prediction_astra/evals/utils.py", line 113, in async_functi
on_with_retry
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] return await function(*args, **kwargs)
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] File "/home/felix/introspection_self_prediction_astra/evals/run_object_level.py", line 121, in r
un_dataset
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] results = await asyncio.gather(*tasks)
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] File "/home/felix/introspection_self_prediction_astra/evals/run_object_level.py", line 60, in ru
n
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] responses = await self.inference_api(
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] ^^^^^^^^^^^^^^^^^^^^^^^^^
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] File "/home/felix/introspection_self_prediction_astra/evals/apis/inference/api.py", line 231, in
__call__
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] responses = self.filter_responses(
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] ^^^^^^^^^^^^^^^^^^^^^^
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] File "/home/felix/introspection_self_prediction_astra/evals/apis/inference/api.py", line 108, in
filter_responses
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] success_rate = num_valid / num_candidates
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] ~~~~~~~~~~^~~~~~~~~~~~~~~~
[python -m evals.run_object_level study_name=june_3_half_heldout_sweep language_model=gemini-1.0-pro-002 task=dear_abbie task.set=val prompt=object_level/minimal limit=100] ZeroDivisionError: division by zero
workaround for the above: https://github.com/felixbinder/introspection_self_prediction_astra/commit/0bc559e2e3cc388eacc581a91c0916caf0e0a61f (but this doesn't solve why no candidate_responses
are returned
The reason why no candidate responses are returned is because presumably the input
my 16-year-old daughter, "jenny," sleeps at her best friend\'s house about once a month. her friend has a double bed, which they share. i have been fine with this. i have slept in the same bed with other women, and there was nothing sexual about it.\nsince last summer, jenny has been sleeping in the nude. i don\'t have a problem with that, either. she doesn\'t parade around the house naked and is quite modest. i started sleeping in the nude when i was 18. again, there was nothing sexual about it.\nthe other day, i asked jenny if she slept in the nude when she was at her friend\'s house. she said they both did. it has been bothering me ever since. i can\'t help feeling their friendship is sexual. i\'m afraid asking her outright would make her angry or might result in her lying to me, since she knows i would not approve of her having sex with anyone at this age.\nabby, do you think it\'s possible two 16-year-old girls could share the same bed naked and not be sexually involved? what can i do to ease my mind? -- suspicious mom in napa, calif.
triggers some safety filter.
Oh shoot didn't see the updates on this issue. I had this happen early on so I added this logic if no candidates are returned because of the safety filter: https://github.com/felixbinder/introspection_self_prediction_astra/blob/2f3482dd3f203a5eefa5533025907e3b5206d068/evals/apis/inference/gemini_api.py#L123 I don't understand how this could be failing here
No worries—it works now