confident-ai / deepeval

The LLM Evaluation Framework
https://docs.confident-ai.com/
Apache License 2.0
3.07k stars 238 forks source link

Running `evaluate` within a Google Colab causes `RecursionError` #350

Closed manuelmorales closed 9 months ago

manuelmorales commented 9 months ago

When running evaluate() within Google Colab, I get a RecursionError. It happens regardless of using the standalone imported evaluate or dataset.evaluate(). it doesn't happen when calling insight_metric.measure(). I'm using Python 3.10 and Deepeval 0.20.33.

See the example notebook Deepeval RecursionError.ipynb. And screenshot:

image

Snippet that reproduces the bug (same as in Colab)

from deepeval import evaluate
from deepeval.metrics import HallucinationMetric
from deepeval.test_case import LLMTestCase

context=["A man with blond-hair, and a brown shirt drinking out of a public water fountain."]
actual_output="A blond drinking water in public."

test_case = LLMTestCase(
    input="What was the blond doing?",
    actual_output=actual_output,
    context=context
)
metric = HallucinationMetric(minimum_score=0.5)

evaluate([test_case], [metric])
Full output ``` ====================================================================== Metrics Summary - ✅ Hallucination (score: 0.9959943294525146, minimum_score: 0.5) For test case: - input: What was the blond doing? - actual output: A blond drinking water in public. - expected output: None - context: ['A man with blond-hair, and a brown shirt drinking out of a public water fountain.'] - retrieval context: None ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- Exception in thread Thread-15: ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- Exception in threading.excepthook: --------------------------------------------------------------------------- RecursionError Traceback (most recent call last) [/usr/local/lib/python3.10/dist-packages/rich/console.py](https://localhost:8080/#) in print(self, sep, end, style, justify, overflow, no_wrap, emoji, markup, highlight, width, height, crop, soft_wrap, new_line_start, *objects) 1673 with self: -> 1674 renderables = self._collect_renderables( 1675 objects, 18 frames [/usr/local/lib/python3.10/dist-packages/rich/console.py](https://localhost:8080/#) in _collect_renderables(self, objects, sep, end, justify, emoji, markup, highlight) 1537 append_text( -> 1538 self.render_str( 1539 renderable, emoji=emoji, markup=markup, highlighter=_highlighter [/usr/local/lib/python3.10/dist-packages/rich/console.py](https://localhost:8080/#) in render_str(self, text, style, justify, overflow, emoji, markup, highlight, highlighter) 1449 if _highlighter is not None: -> 1450 highlight_text = _highlighter(str(rich_text)) 1451 highlight_text.copy_styles(rich_text) [/usr/local/lib/python3.10/dist-packages/rich/highlighter.py](https://localhost:8080/#) in __call__(self, text) 32 if isinstance(text, str): ---> 33 highlight_text = Text(text) 34 elif isinstance(text, Text): [/usr/local/lib/python3.10/dist-packages/rich/text.py](https://localhost:8080/#) in __init__(self, text, style, justify, overflow, no_wrap, end, tab_size, spans) 154 ) -> None: --> 155 sanitized_text = strip_control_codes(text) 156 self._text = [sanitized_text] [/usr/local/lib/python3.10/dist-packages/rich/control.py](https://localhost:8080/#) in strip_control_codes(text, _translate_table) 197 """ --> 198 return text.translate(_translate_table) 199 RecursionError: maximum recursion depth exceeded while calling a Python object During handling of the above exception, another exception occurred: RecursionError Traceback (most recent call last) [](https://localhost:8080/#) in () 13 metric = HallucinationMetric(minimum_score=0.5) 14 ---> 15 evaluate([test_case], [metric]) [/usr/local/lib/python3.10/dist-packages/deepeval/evaluate.py](https://localhost:8080/#) in evaluate(test_cases, metrics) 115 for test_result in test_results: 116 print_test_result(test_result) --> 117 print("\n" + "-" * 70) 118 119 test_run_manager.wrap_up_test_run(display_table=False) [/usr/local/lib/python3.10/dist-packages/rich/file_proxy.py](https://localhost:8080/#) in write(self, text) 41 if lines: 42 console = self.__console ---> 43 with console: 44 output = Text("\n").join( 45 self.__ansi_decoder.decode_line(line) for line in lines [/usr/local/lib/python3.10/dist-packages/rich/console.py](https://localhost:8080/#) in __exit__(self, exc_type, exc_value, traceback) 863 def __exit__(self, exc_type: Any, exc_value: Any, traceback: Any) -> None: 864 """Exit buffer context.""" --> 865 self._exit_buffer() 866 867 def begin_capture(self) -> None: [/usr/local/lib/python3.10/dist-packages/rich/console.py](https://localhost:8080/#) in _exit_buffer(self) 821 """Leave buffer context, and render content if required.""" 822 self._buffer_index -= 1 --> 823 self._check_buffer() 824 825 def set_live(self, live: "Live") -> None: [/usr/local/lib/python3.10/dist-packages/rich/console.py](https://localhost:8080/#) in _check_buffer(self) 2005 from .jupyter import display 2006 -> 2007 display(self._buffer, self._render_buffer(self._buffer[:])) 2008 del self._buffer[:] 2009 else: [/usr/local/lib/python3.10/dist-packages/rich/jupyter.py](https://localhost:8080/#) in display(segments, text) 89 from IPython.display import display as ipython_display 90 ---> 91 ipython_display(jupyter_renderable) 92 except ModuleNotFoundError: 93 # Handle the case where the Console has force_jupyter=True, [/usr/local/lib/python3.10/dist-packages/IPython/core/display.py](https://localhost:8080/#) in display(include, exclude, metadata, transient, display_id, *objs, **kwargs) 325 # kwarg-specified metadata gets precedence 326 _merge(md_dict, metadata) --> 327 publish_display_data(data=format_dict, metadata=md_dict, **kwargs) 328 if display_id: 329 return DisplayHandle(display_id) [/usr/local/lib/python3.10/dist-packages/IPython/core/display.py](https://localhost:8080/#) in publish_display_data(data, metadata, source, transient, **kwargs) 117 kwargs['transient'] = transient 118 --> 119 display_pub.publish( 120 data=data, 121 metadata=metadata, [/usr/local/lib/python3.10/dist-packages/ipykernel/zmqshell.py](https://localhost:8080/#) in publish(self, data, metadata, source, transient, update) 113 If True, send an update_display_data message instead of display_data. 114 """ --> 115 self._flush_streams() 116 if metadata is None: 117 metadata = {} [/usr/local/lib/python3.10/dist-packages/ipykernel/zmqshell.py](https://localhost:8080/#) in _flush_streams(self) 80 def _flush_streams(self): 81 """flush IO Streams prior to display""" ---> 82 sys.stdout.flush() 83 sys.stderr.flush() 84 [/usr/local/lib/python3.10/dist-packages/rich/file_proxy.py](https://localhost:8080/#) in flush(self) 51 output = "".join(self.__buffer) 52 if output: ---> 53 self.__console.print(output) 54 del self.__buffer[:] 55 [/usr/local/lib/python3.10/dist-packages/rich/console.py](https://localhost:8080/#) in print(self, sep, end, style, justify, overflow, no_wrap, emoji, markup, highlight, width, height, crop, soft_wrap, new_line_start, *objects) 1671 crop = False 1672 render_hooks = self._render_hooks[:] -> 1673 with self: 1674 renderables = self._collect_renderables( 1675 objects, ... last 10 frames repeated, from the frame below ... [/usr/local/lib/python3.10/dist-packages/rich/console.py](https://localhost:8080/#) in __exit__(self, exc_type, exc_value, traceback) 863 def __exit__(self, exc_type: Any, exc_value: Any, traceback: Any) -> None: 864 """Exit buffer context.""" --> 865 self._exit_buffer() 866 867 def begin_capture(self) -> None: RecursionError: maximum recursion depth exceeded ```

Edit: Hid the full output behind a collapsible area for visibility.

penguine-ip commented 9 months ago

@manuelmorales can you show me your environment? Either pip freeze if you're on pip or poetry show if you're using poetry.

manuelmorales commented 9 months ago

Sure @penguine-ip , here it is:

pip freeze output.
absl-py==1.4.0
aiohttp==3.9.1
aiosignal==1.3.1
alabaster==0.7.13
albumentations==1.3.1
altair==4.2.2
anyio==3.7.1
appdirs==1.4.4
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
array-record==0.5.0
arviz==0.15.1
astropy==5.3.4
astunparse==1.6.3
async-timeout==4.0.3
atpublic==4.0
attrs==23.1.0
audioread==3.0.1
autograd==1.6.2
Babel==2.13.1
backcall==0.2.0
beautifulsoup4==4.11.2
bidict==0.22.1
bigframes==0.15.0
bleach==6.1.0
blinker==1.4
blis==0.7.11
blosc2==2.0.0
bokeh==3.3.2
bqplot==0.12.42
branca==0.7.0
build==1.0.3
CacheControl==0.13.1
cachetools==5.3.2
catalogue==2.0.10
certifi==2023.11.17
cffi==1.16.0
chardet==5.2.0
charset-normalizer==3.3.2
chex==0.1.7
click==8.1.7
click-plugins==1.1.1
cligj==0.7.2
cloudpickle==2.2.1
cmake==3.27.9
cmdstanpy==1.2.0
colorcet==3.0.1
colorlover==0.3.0
colour==0.1.5
community==1.0.0b1
confection==0.1.4
cons==0.4.6
contextlib2==21.6.0
contourpy==1.2.0
cryptography==41.0.7
cufflinks==0.17.3
cupy-cuda11x==11.0.0
cvxopt==1.3.2
cvxpy==1.3.2
cycler==0.12.1
cymem==2.0.8
Cython==3.0.6
dask==2023.8.1
dataclasses-json==0.6.3
datascience==0.17.6
datasets==2.15.0
db-dtypes==1.1.1
dbus-python==1.2.18
debugpy==1.6.6
decorator==4.4.2
deepeval==0.20.35
defusedxml==0.7.1
detoxify==0.5.1
dill==0.3.7
diskcache==5.6.3
distributed==2023.8.1
distro==1.7.0
dlib==19.24.2
dm-tree==0.1.8
docutils==0.18.1
dopamine-rl==4.0.6
duckdb==0.9.2
earthengine-api==0.1.381
easydict==1.11
ecos==2.0.12
editdistance==0.6.2
eerepr==0.0.4
en-core-web-sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.6.0/en_core_web_sm-3.6.0-py3-none-any.whl#sha256=83276fc78a70045627144786b52e1f2728ad5e29e5e43916ec37ea9c26a11212
entrypoints==0.4
et-xmlfile==1.1.0
etils==1.5.2
etuples==0.3.9
exceptiongroup==1.2.0
execnet==2.0.2
fastai==2.7.13
fastcore==1.5.29
fastdownload==0.0.7
fastjsonschema==2.19.0
fastprogress==1.0.3
fastrlock==0.8.2
filelock==3.13.1
fiona==1.9.5
firebase-admin==5.3.0
Flask==2.2.5
flatbuffers==23.5.26
flax==0.7.5
folium==0.14.0
fonttools==4.46.0
frozendict==2.3.10
frozenlist==1.4.0
fsspec==2023.6.0
future==0.18.3
gast==0.5.4
gcsfs==2023.6.0
GDAL==3.4.3
gdown==4.6.6
geemap==0.29.6
gensim==4.3.2
geocoder==1.38.1
geographiclib==2.0
geopandas==0.13.2
geopy==2.3.0
gin-config==0.5.0
glob2==0.7
google==2.0.3
google-ai-generativelanguage==0.3.3
google-api-core==2.11.1
google-api-python-client==2.84.0
google-auth==2.17.3
google-auth-httplib2==0.1.1
google-auth-oauthlib==1.0.0
google-cloud-aiplatform==1.36.4
google-cloud-bigquery==3.12.0
google-cloud-bigquery-connection==1.12.1
google-cloud-bigquery-storage==2.23.0
google-cloud-core==2.3.3
google-cloud-datastore==2.15.2
google-cloud-firestore==2.11.1
google-cloud-functions==1.13.3
google-cloud-iam==2.12.2
google-cloud-language==2.9.1
google-cloud-resource-manager==1.10.4
google-cloud-storage==2.8.0
google-cloud-translate==3.11.3
google-colab @ file:///colabtools/dist/google-colab-1.0.0.tar.gz#sha256=d402ff04028f211431c19a5a99d65e12eecf2eb3911ce18e39c83b276e17afcc
google-crc32c==1.5.0
google-generativeai==0.2.2
google-pasta==0.2.0
google-resumable-media==2.6.0
googleapis-common-protos==1.61.0
googledrivedownloader==0.4
graphviz==0.20.1
greenlet==3.0.1
grpc-google-iam-v1==0.12.7
grpcio==1.59.3
grpcio-status==1.48.2
gspread==3.4.2
gspread-dataframe==3.3.1
gym==0.25.2
gym-notices==0.0.8
h11==0.14.0
h5netcdf==1.3.0
h5py==3.9.0
holidays==0.38
holoviews==1.17.1
html5lib==1.1
httpcore==1.0.2
httpimport==1.3.1
httplib2==0.22.0
httpx==0.25.2
huggingface-hub==0.19.4
humanize==4.7.0
hyperopt==0.2.7
ibis-framework==6.2.0
idna==3.6
imageio==2.31.6
imageio-ffmpeg==0.4.9
imagesize==1.4.1
imbalanced-learn==0.10.1
imgaug==0.4.0
importlib-metadata==7.0.0
importlib-resources==6.1.1
imutils==0.5.4
inflect==7.0.0
iniconfig==2.0.0
install==1.3.5
intel-openmp==2023.2.0
ipyevents==2.0.2
ipyfilechooser==0.6.0
ipykernel==5.5.6
ipyleaflet==0.18.0
ipython==7.34.0
ipython-genutils==0.2.0
ipython-sql==0.5.0
ipytree==0.2.2
ipywidgets==7.7.1
itsdangerous==2.1.2
jax==0.4.20
jaxlib @ https://storage.googleapis.com/jax-releases/cuda11/jaxlib-0.4.20+cuda11.cudnn86-cp310-cp310-manylinux2014_x86_64.whl#sha256=01be66238133f884bf5adf15cd7eaaf8445f9d4b056c5c64df28a997a6aff2fe
jeepney==0.7.1
jieba==0.42.1
Jinja2==3.1.2
joblib==1.3.2
jsonpatch==1.33
jsonpickle==3.0.2
jsonpointer==2.4
jsonschema==4.19.2
jsonschema-specifications==2023.11.2
jupyter-client==6.1.12
jupyter-console==6.1.0
jupyter-server==1.24.0
jupyter_core==5.5.0
jupyterlab-widgets==3.0.9
jupyterlab_pygments==0.3.0
kaggle==1.5.16
keras==2.14.0
keyring==23.5.0
kiwisolver==1.4.5
langchain==0.0.350
langchain-community==0.0.3
langchain-core==0.1.0
langcodes==3.3.0
langsmith==0.0.70
launchpadlib==1.10.16
lazr.restfulclient==0.14.4
lazr.uri==1.0.6
lazy_loader==0.3
libclang==16.0.6
librosa==0.10.1
lida==0.0.10
lightgbm==4.1.0
linkify-it-py==2.0.2
llmx==0.0.15a0
llvmlite==0.41.1
locket==1.0.0
logical-unification==0.4.6
lxml==4.9.3
malloy==2023.1067
Markdown==3.5.1
markdown-it-py==3.0.0
MarkupSafe==2.1.3
marshmallow==3.20.1
matplotlib==3.7.1
matplotlib-inline==0.1.6
matplotlib-venn==0.11.9
mdit-py-plugins==0.4.0
mdurl==0.1.2
miniKanren==1.0.3
missingno==0.5.2
mistune==0.8.4
mizani==0.9.3
mkl==2023.2.0
ml-dtypes==0.2.0
mlxtend==0.22.0
more-itertools==10.1.0
moviepy==1.0.3
mpmath==1.3.0
msgpack==1.0.7
multidict==6.0.4
multipledispatch==1.0.0
multiprocess==0.70.15
multitasking==0.0.11
murmurhash==1.0.10
music21==9.1.0
mypy-extensions==1.0.0
natsort==8.4.0
nbclassic==1.0.0
nbclient==0.9.0
nbconvert==6.5.4
nbformat==5.9.2
nest-asyncio==1.5.8
networkx==3.2.1
nibabel==4.0.2
nltk==3.8.1
notebook==6.5.5
notebook_shim==0.2.3
numba==0.58.1
numexpr==2.8.7
numpy==1.23.5
oauth2client==4.1.3
oauthlib==3.2.2
openai==1.3.9
opencv-contrib-python==4.8.0.76
opencv-python==4.8.0.76
opencv-python-headless==4.8.1.78
openpyxl==3.1.2
opt-einsum==3.3.0
optax==0.1.7
orbax-checkpoint==0.4.4
osqp==0.6.2.post8
packaging==23.2
pandas==1.5.3
pandas-datareader==0.10.0
pandas-gbq==0.19.2
pandas-stubs==1.5.3.230304
pandocfilters==1.5.0
panel==1.3.4
param==2.0.1
parso==0.8.3
parsy==2.1
partd==1.4.1
pathlib==1.0.1
pathy==0.10.3
patsy==0.5.4
peewee==3.17.0
pexpect==4.9.0
pickleshare==0.7.5
Pillow==9.4.0
pip-tools==6.13.0
platformdirs==4.1.0
plotly==5.15.0
plotnine==0.12.4
pluggy==1.3.0
polars==0.17.3
pooch==1.8.0
portalocker==2.8.2
portpicker==1.5.2
prefetch-generator==1.0.3
preshed==3.0.9
prettytable==3.9.0
proglog==0.1.10
progressbar2==4.2.0
prometheus-client==0.19.0
promise==2.3
prompt-toolkit==3.0.41
prophet==1.1.5
proto-plus==1.22.3
protobuf==4.21.6
psutil==5.9.5
psycopg2==2.9.9
ptyprocess==0.7.0
py-cpuinfo==9.0.0
py4j==0.10.9.7
pyarrow==10.0.1
pyarrow-hotfix==0.6
pyasn1==0.5.1
pyasn1-modules==0.3.0
pycocotools==2.0.7
pycparser==2.21
pyct==0.5.0
pydantic==1.10.13
pydata-google-auth==1.8.2
pydot==1.4.2
pydot-ng==2.0.0
pydotplus==2.0.2
PyDrive==1.3.1
PyDrive2==1.6.3
pyerfa==2.0.1.1
pygame==2.5.2
Pygments==2.16.1
PyGObject==3.42.1
PyJWT==2.3.0
pymc==5.7.2
pymystem3==0.2.0
PyOpenGL==3.1.7
pyOpenSSL==23.3.0
pyparsing==3.1.1
pyperclip==1.8.2
pyproj==3.6.1
pyproject_hooks==1.0.0
pysbd==0.3.4
pyshp==2.3.1
PySocks==1.7.1
pytensor==2.14.2
pytest==7.4.3
pytest-xdist==3.5.0
python-apt==0.0.0
python-box==7.1.1
python-dateutil==2.8.2
python-louvain==0.16
python-slugify==8.0.1
python-utils==3.8.1
pytz==2023.3.post1
pyviz_comms==3.0.0
PyWavelets==1.5.0
PyYAML==6.0.1
pyzmq==23.2.1
qdldl==0.1.7.post0
qudida==0.0.4
ragas==0.0.22
ratelim==0.1.6
referencing==0.31.1
regex==2023.6.3
requests==2.31.0
requests-oauthlib==1.3.1
requirements-parser==0.5.0
rich==13.7.0
rouge-score==0.1.2
rpds-py==0.13.2
rpy2==3.4.2
rsa==4.9
safetensors==0.4.1
scikit-image==0.19.3
scikit-learn==1.2.2
scipy==1.11.4
scooby==0.9.2
scs==3.2.4.post1
seaborn==0.12.2
SecretStorage==3.3.1
Send2Trash==1.8.2
sentence-transformers==2.2.2
sentencepiece==0.1.99
sentry-sdk==1.39.1
shapely==2.0.2
six==1.16.0
sklearn-pandas==2.2.0
smart-open==6.4.0
sniffio==1.3.0
snowballstemmer==2.2.0
sortedcontainers==2.4.0
soundfile==0.12.1
soupsieve==2.5
soxr==0.3.7
spacy==3.6.1
spacy-legacy==3.0.12
spacy-loggers==1.0.5
Sphinx==5.0.2
sphinxcontrib-applehelp==1.0.7
sphinxcontrib-devhelp==1.0.5
sphinxcontrib-htmlhelp==2.0.4
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.6
sphinxcontrib-serializinghtml==1.1.9
SQLAlchemy==2.0.23
sqlglot==17.16.2
sqlparse==0.4.4
srsly==2.4.8
stanio==0.3.0
statsmodels==0.14.0
sympy==1.12
tables==3.8.0
tabulate==0.9.0
tbb==2021.11.0
tblib==3.0.0
tenacity==8.2.3
tensorboard==2.14.1
tensorboard-data-server==0.7.2
tensorflow==2.14.0
tensorflow-datasets==4.9.3
tensorflow-estimator==2.14.0
tensorflow-gcs-config==2.14.0
tensorflow-hub==0.15.0
tensorflow-io-gcs-filesystem==0.34.0
tensorflow-metadata==1.14.0
tensorflow-probability==0.22.0
tensorstore==0.1.45
termcolor==2.4.0
terminado==0.18.0
text-unidecode==1.3
textblob==0.17.1
tf-slim==1.1.0
thinc==8.1.12
threadpoolctl==3.2.0
tifffile==2023.9.26
tiktoken==0.5.2
tinycss2==1.2.1
tokenizers==0.12.1
toml==0.10.2
tomli==2.0.1
toolz==0.12.0
torch @ https://download.pytorch.org/whl/cu118/torch-2.1.0%2Bcu118-cp310-cp310-linux_x86_64.whl#sha256=a81b554184492005543ddc32e96469f9369d778dedd195d73bda9bed407d6589
torchaudio @ https://download.pytorch.org/whl/cu118/torchaudio-2.1.0%2Bcu118-cp310-cp310-linux_x86_64.whl#sha256=cdfd0a129406155eee595f408cafbb92589652da4090d1d2040f5453d4cae71f
torchdata==0.7.0
torchsummary==1.5.1
torchtext==0.16.0
torchvision @ https://download.pytorch.org/whl/cu118/torchvision-0.16.0%2Bcu118-cp310-cp310-linux_x86_64.whl#sha256=033712f65d45afe806676c4129dfe601ad1321d9e092df62b15847c02d4061dc
tornado==6.3.2
tqdm==4.66.1
traitlets==5.7.1
traittypes==0.2.1
transformers==4.22.1
triton==2.1.0
tweepy==4.14.0
typer==0.9.0
types-pytz==2023.3.1.1
types-setuptools==69.0.0.0
typing-inspect==0.9.0
typing_extensions==4.5.0
tzlocal==5.2
uc-micro-py==1.0.2
uritemplate==4.1.1
urllib3==2.0.7
vega-datasets==0.9.0
wadllib==1.3.6
wasabi==1.1.2
wcwidth==0.2.12
webcolors==1.13
webencodings==0.5.1
websocket-client==1.7.0
Werkzeug==3.0.1
widgetsnbextension==3.6.6
wordcloud==1.9.2
wrapt==1.14.1
xarray==2023.7.0
xarray-einstats==0.6.0
xgboost==2.0.2
xlrd==2.0.1
xxhash==3.4.1
xyzservices==2023.10.1
yarl==1.9.3
yellowbrick==1.5
yfinance==0.2.32
zict==3.0.0
zipp==3.17.0
penguine-ip commented 9 months ago

Thanks for providing this, can you please come to our discord and we can dive deeper there: https://discord.com/invite/a3K9c8GRGt

For now I will try to reproduce the error in a colab.

manuelmorales commented 9 months ago

Sure. I just joined and DM'd you.

For now I will try to reproduce the error in a colab.

If it is any useful, I created this Collab to reproduce it.

SawyerCzupka commented 8 months ago

I am running into a similar issue running evaluate() on jupyter

penguine-ip commented 8 months ago

I couldn't produce it, i'm running the colab in the quickstart repo: https://colab.research.google.com/drive/1PPxYEBa6eu__LquGoFFJZkhYgWVYE6kh?usp=sharing, what version are you using?

kubre commented 6 months ago

Facing the same issue it does not always happen but I've noticed it in following scenario

answer_relevancy_metric = AnswerRelevancyMetric()
faithfulness_metric = FaithfulnessMetric()
answer="""Indian Railways is a statutory body under the ownership of the Ministry of Railways of the Government of India.

Q: How many operational zones is Indian Railways divided into?
A: Indian Railways is divided into 17 operational zones geographically.

Q: What is the running track length of Indian Railways?
A: The running track length of Indian Railways is 104,647 km (65,025 mi).

Q: When was the first electric train run in India?
A: The first electric train ran in Bombay on DC traction in 1925.

Q: How many employees does Indian Railways have?
A: Indian Railways has more than 1.2 million employees, making it the world's ninth-largest employer and India's second largest employer.

Q: When was Indian Railway Catering and Tourism Corporation (IRCTC) established?
A: Indian Railway Catering and Tourism Corporation (IRCTC) was established on 27 September 1999."""
contexts=["Indian Railways is a statutory body under the ownership of the Ministry of Railways of the Government of India that operates India's national railway system. It is headed by a Railway Board whose chairman reports to the Ministry of Railways. It is organized into separate functional groups or verticals while divided into 17 operational zones geographically. Each zone, headed by a General Manager, is semi-autonomous thus creating a matrix organization where the functional branches are under dual control.", "Indian Railways (IR) is a statutory body under the ownership of the Ministry of Railways of the Government of India that operates India's national railway system. As of 2023, it manages the fourth largest national railway system by size with a running track length of 104,647 km (65,025 mi) and route length of 68,426 km (42,518 mi) of which 60,451 km (37,563 mi) is electrified. With more than 1.2 million employees, it is the world's ninth-largest employer and India's second largest employer.\nThe first steam operated railway operated in 1837 in Madras with the first passenger operating in 1853 between Bombay and Thane. In 1925, the first electric train ran in Bombay on DC traction. The first locomotive manufacturing unit was commissioned in 1950 at Chittaranjan with the first coach manufacturing unit set-up at Madras in 1955. Various companies operating railways across the country were re-organised into six regional zones in 1951, which were gradually expanded to 19 zones.", "The Ministry of Railways is a ministry in the Government of India, responsible for the country's rail transport. Indian Railways is a statutory body managed by the railway board under the ownership of the ministry that operates the national railway system. The ministry along with the railway board is housed inside Rail Bhawan in New Delhi. With more than 1.2 million employees, it is one of the world's largest employers.", "== History ==\nIndian Railway Catering and Tourism Corporation (IRCTC) was established on 27 September 1999, as a public sector undertaking completely owned by the Government of India through the Indian Railways. In May 2008, it was classed as a Miniratna public corporation, which allowed it a certain degree of financial autonomy. The company was listed on the National Stock Exchange in 2019, following which the Government of India's holding was reduced to 87%, with the remaining shares being publicly traded. In December 2020, the Government of India divested another 20%, reducing its holding in the IRCTC to 67%. In December 2022, the government dis-invested further 5% of its share, reducing its ownership to 62.4%.\n\n\n== Services =="]

test_case = LLMTestCase(
  input="Who owns indian Railways?",
  actual_output=answer,
  expected_output="We are a statutory body under the ownership of the Ministry of Railways of the Government of India.",
  retrieval_context=contexts,
)
dataset = EvaluationDataset(test_cases=[test_case])
evaluate(dataset, metrics=[answer_relevancy_metric, faithfulness_metric])

Output:


Metrics Summary

  - ❌ Answer Relevancy (score: 0.16666666666666666, threshold: 0.5, evaluation model: gpt-4-0125-preview, reason: 
The score is 0.17 because the response provided a lot of detailed information about Indian Railways, such as its 
operational zones, track length, history of electric trains, employment statistics, and the establishment of IRCTC,
which, while informative, did not directly address the specific question about the ownership of Indian Railways. 
This resulted in a low relevancy score as the actual question was not directly answered amidst the abundance of 
unrelated facts.)

  - ✅ Faithfulness (score: 1.0, threshold: 0.5, evaluation model: gpt-4-0125-preview, reason: The score is 1.00 
because the actual output is perfectly aligned with the information presented in the retrieval context, showcasing 
a high level of faithfulness without any contradictions. Great job on maintaining accuracy!)

For test case:

  - input: Who owns indian Railways?

  - actual output: Indian Railways is a statutory body under the ownership of the Ministry of Railways of the 
Government of India.

Q: How many operational zones is Indian Railways divided into?
A: Indian Railways is divided into 17 operational zones geographically.

Q: What is the running track length of Indian Railways?
A: The running track length of Indian Railways is 104,647 km (65,025 mi).

Q: When was the first electric train run in India?
A: The first electric train ran in Bombay on DC traction in 1925.

Q: How many employees does Indian Railways have?
A: Indian Railways has more than 1.2 million employees, making it the world's ninth-largest employer and India's 
second largest employer.

Q: When was Indian Railway Catering and Tourism Corporation (IRCTC) established?
A: Indian Railway Catering and Tourism Corporation (IRCTC) was established on 27 September 1999.
A: Indian Railway Catering and Tourism Corporation (IRCTC) was established on 27 September 1999.
A: Indian Railway Catering and Tourism Corporation (IRCTC) was established on 27 September 1999.
A: Indian Railway Catering and Tourism Corporation (IRCTC) was established on 27 September 1999.
A: Indian Railway Catering and Tourism Corporation (IRCTC) was established on 27 September 1999.
A: Indian Railway Catering and Tourism Corporation (IRCTC) was established on 27 September 1999.
A: Indian Railway Catering and Tourism Corporation (IRCTC) was established on 27 September 1999.
A: Indian Railway Catering and Tourism Corporation (IRCTC) was established on 27 September 1999.
IT GOES ON
penguine-ip commented 6 months ago

@kubre what version of deepeval is this?

kubre commented 6 months ago

@penguine-ip It should be the latest as I'm running it in colab. Also if you suppress output using %%capture it works fine

Collecting deepeval
  Downloading deepeval-0.20.81-py3-none-any.whl (128 kB)