[2024-02-13, 17:31:11 EST] {taskinstance.py:2699} ERROR - Task failed with exception
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/airflow/models/taskinstance.py", line 433, in _execute_task
result = execute_callable(context=context, **execute_callable_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/airflow/decorators/base.py", line 242, in execute
return_value = super().execute(context)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/airflow/operators/python.py", line 199, in execute
return_value = self.execute_callable()
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/airflow/operators/python.py", line 216, in execute_callable
return self.python_callable(*self.op_args, **self.op_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/airflow/dags/monitor/test_retrieval.py", line 210, in generate_test_answers
questions_df[["askastro_answer", "askastro_references", "langsmith_link"]] = questions_df.question.apply(
~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/core/frame.py", line 4079, in __setitem__
self._setitem_array(key, value)
File "/usr/local/lib/python3.11/site-packages/pandas/core/frame.py", line 4138, in _setitem_array
self._iset_not_inplace(key, value)
File "/usr/local/lib/python3.11/site-packages/pandas/core/frame.py", line 4157, in _iset_not_inplace
raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key
question_number_subset
The questions_df after adding debug logging is empty, this only occurs if someone puts in a subset of question ids
The question_number_subset param isn't parsed correctly due to the incorrect code json.loads() which attempts to parse string into list of ints (but not correctly), leading to no questions being added here.
To Reproduce
Steps to reproduce the behavior:
Have proper configuration of environment variables for the test_retrieval DAG
Trigger the DAG
Put a list of subset question ids in the parameter prompt, such as [1,2,3]
Errors out during DAG run
Expected behavior
No errors
Improvements
The references saved in the csv are in random incorrect order. This is probably related to the fact that it is put into a set using {} somewhere.
The multi-query references and the weaviate search references are not relevant. They don't provide useful info but delays the pipeline and incurs cost.
Bug
Describe the bug
questions_df
after adding debug logging is empty, this only occurs if someone puts in a subset of question idsquestion_number_subset
param isn't parsed correctly due to the incorrect codejson.loads()
which attempts to parse string into list of ints (but not correctly), leading to no questions being added here.To Reproduce Steps to reproduce the behavior:
test_retrieval
DAG[1,2,3]
Expected behavior No errors
Improvements
{}
somewhere.