Closed sameeryadav closed 1 year ago
its working fine around one week ago -
Hi @sameeryadav,
Could you share a bit more about the structure of your data: where is the target and prediction columns, how are they named, and what type are they?
You might need to pass the column_mapping
object when you run the Report (line 3 on your screenshot). If you do not pass the column mapping, Evidently will try to parse the data automatically expecting a standard schema (e.g. target to be called "target").
Here are the details on column mapping: https://docs.evidentlyai.com/user-guide/input-data/column-mapping#prediction-column-s-in-classification
@elenasamuylova ,Thankyou for your response, Here are the details of data and their types
I also dropped the extra columns from current_df before passing it to classification preset I also tried this on older version 0.3.3 but issue did not resolved.
Hi @sameeryadav,
Could you also check the following:
1. Evidently version
import evidently
print(evidently.__version__)
2. Unique value counts in target and prediction
The error might happen if the unique values in target and prediction columns do not match.
current_df.target.value_counts()
and
current_ref.prediction.value_counts()
If this is not the source of the issue - please send the complete error trace (it appears that some part of it is missing from the screenshot). Are there any known changes to the dataset between last and this week on your side?
Everything appears to work correctly on our simple test datasets, so we'd need some more information to know how to reproduce it.
Hi, @elenasamuylova
KeyError Traceback (most recent call last)
File
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-e3f46fea-f94c-4027-85d3-8c78354699ba/lib/python3.9/site-packages/evidently/suite/base_suite.py:169, in Display.show(self, mode) 168 def show(self, mode="auto"): --> 169 dashboard_id, dashboard_info, graphs = self._build_dashboard_info() 170 template_params = TemplateParams( 171 dashboard_id=dashboard_id, 172 dashboard_info=dashboard_info, 173 additional_graphs=graphs, 174 ) 175 # pylint: disable=import-outside-toplevel
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-e3f46fea-f94c-4027-85d3-8c78354699ba/lib/python3.9/site-packages/evidently/report/report.py:171, in Report._build_dashboard_info(self) 169 # set the color scheme from the report for each render 170 renderer.color_options = color_options --> 171 html_info = renderer.render_html(test) 173 for info_item in html_info: 174 for additional_graph in info_item.get_additional_graphs():
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-e3f46fea-f94c-4027-85d3-8c78354699ba/lib/python3.9/site-packages/evidently/metrics/classification_performance/classification_quality_metric.py:74, in ClassificationQualityMetricRenderer.render_html(self, obj) 73 def render_html(self, obj: ClassificationQualityMetric) -> List[BaseWidgetInfo]: ---> 74 metric_result = obj.get_result() 75 target_name = metric_result.target_name 76 result = []
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-e3f46fea-f94c-4027-85d3-8c78354699ba/lib/python3.9/site-packages/evidently/base_metric.py:184, in Metric.get_result(self) 182 result = self._context.metric_results.get(self, None) 183 if isinstance(result, ErrorResult): --> 184 raise result.exception 185 if result is None: 186 raise ValueError(f"No result found for metric {self} of type {type(self).name}")
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-e3f46fea-f94c-4027-85d3-8c78354699ba/lib/python3.9/site-packages/evidently/suite/base_suite.py:393, in Suite.run_calculate(self, data) 391 logging.debug(f"Executing {type(calculation)}...") 392 try: --> 393 calculations[calculation] = calculation.calculate(data) 394 except BaseException as ex: 395 calculations[calculation] = ErrorResult(ex)
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-e3f46fea-f94c-4027-85d3-8c78354699ba/lib/python3.9/site-packages/evidently/metrics/classification_performance/classification_quality_metric.py:46, in ClassificationQualityMetric.calculate(self, data) 44 raise ValueError("The columns 'target' and 'prediction' columns should be present") 45 target, prediction = self.get_target_prediction_data(data.current_data, data.column_mapping) ---> 46 current = calculate_metrics( 47 data.column_mapping, 48 self._confusion_matrix_metric.get_result().current_matrix, 49 target, 50 prediction, 51 ) 53 reference = None 54 if data.reference_data is not None:
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-e3f46fea-f94c-4027-85d3-8c78354699ba/lib/python3.9/site-packages/evidently/calculations/classification_performance.py:316, in calculate_metrics(column_mapping, confusion_matrix, target, prediction) 311 if len(prediction.labels) == 2: 312 confusion_by_classes = calculate_confusion_by_classes( 313 np.array(confusion_matrix.values), 314 confusion_matrix.labels, 315 ) --> 316 conf_by_pos_label = confusion_by_classes[pos_label] 317 precision = metrics.precision_score(target, prediction.predictions, pos_label=pos_label) 318 recall = metrics.recall_score(target, prediction.predictions, pos_label=pos_label)
KeyError: 1
performance_report.as_dict() KeyError: 1
Here the value_counts is diffrent for reference_df and current_df but it should not causing the error
Thanks @sameeryadav, could you share the prediction value counts (not only target) to double-check?
cc @mike0sv to help figure out what might be wrong here.
Sure @elenasamuylova @mike0sv
Hi @elenasamuylova I am stuck with this error for several days ,I am using it in a live project . Could you please help me out to resolve this error
Hi @sameeryadav ! The error itself is probably caused by target value being the wrong type (str
insead of int
or vice versa). It's probably our internal bug, however you can try casting target column explicitly to one of those types (try both). I will investigate further in the meantime.
Also, what version of evidenlty you are using?
Hi, @mike0sv
I tried in both version 0.4.0 & 0.3.3
I also changed the dtype as str but issue still there
and I also found this while trying to solve the error - May be it can help you The KeyError you are encountering is likely related to the pos_label variable when calculating the conf_by_pos_label. To fix this issue, you should ensure that the pos_label is correctly set when calculating the classification metrics.
In the ClassificationQualityMetric class, where you calculate the current metrics, you should provide a value for the pos_label parameter when calling the calculate_metrics function. The pos_label parameter is the label of the positive class in your classification problem. It is used in metrics like precision and recall.
To do this, you can modify the calculate method in the ClassificationQualityMetric
I could not reproduce your issue in my environment.
from evidently.report import Report
from evidently import ColumnMapping
from evidently.metrics.classification_performance.classification_quality_metric import ClassificationQualityMetric
import pandas as pd
ref = cur = pd.DataFrame([
{"a": 1, "b": 1},
{"a": 1, "b": 1},
])
report = Report([
ClassificationQualityMetric()
])
report.run(
current_data=cur,
reference_data=ref,
column_mapping=ColumnMapping(target="a", prediction="b", target_names={0: "aa", 1: "bb"}, pos_label=1)
)
report.show()
Can you run this and confirm that it works? If it does, can you modify it a bit with your data so it does not run?
Hey @mike0sv I tried your above code in my env that's runs fine but when i tried with sample of my data got the error same again. One thing I also found that when I am giving pos_label=1 then got KEY Error 1 and when give pos_label=0 then got KEY ERROR 0.
Due to some client restrictions, I could not expose the original data to re-create the error in your example code. I can try to explain how I prepared my dataset
1.Joins
I also checked with same thing with my current_df sample got the same error
Can you put your data in my example and my data into yours? I mean, instead of
ref = cur = pd.DataFrame([
{"a": 1, "b": 1},
{"a": 1, "b": 1},
])
put something like ref = cur = ref_df[["target", "prediction"]][:2]
? If this fails, it means something is off with your data (probably types as I said before).
Also, try to run your example on my example data above - if this fails, it means something is off with report configuration
this bug about existing zeros in target or prediction columns
if we try this dataframe
ref = cur = pd.DataFrame([ {"a": 1, "b": 1}, {"a": 1, "b": 1}, ])
everything works fine
but if we use this data frame
`ref = cur = pd.DataFrame([
{"a": 0, "b": 0},
{"a": 1, "b": 1},
])`
confusion_matrix label will be string ['1','0'] and pos_label is int = 1
in function calculate_metrics rise error
keyError: 1 in line 319 in evidently\calculations\classification_performance.py
conf_by_pos_label = confusion_by_classes[pos_label]
but when all values are one this function doesn't call
I tried with this data and it works for me :/ @master-pro can you share full code and what version are you on?
I tried with this data and it works for me :/ @master-pro can you share full code and what version are you on?
@mike0sv that's weird, the problem is, when importing mlflow library before evidently:
`import mlflow
from evidently.report import Report from evidently import ColumnMapping from evidently.metrics.classification_performance.classification_quality_metric import ClassificationQualityMetric import pandas as pd
ref = cur = pd.DataFrame([ {"a": 0, "b": 0}, {"a": 1, "b": 1}, ])
report = Report([ ClassificationQualityMetric() ]) report.run( current_data=cur, reference_data=ref, column_mapping=ColumnMapping(target="a", prediction="b", pos_label=1) )
print(report.as_dict())`
evidently==0.4.0
mlflow==2.5.0
if you move import mlflow
to the end of import section, the problem solved
have no time to investigate why mlflow caused this error
Ok, I successfully reproduced this, will investigate
Hi, @mike0sv
Do let me know the solution of the issue.
Hi @sameeryadav, could you confirm if you also use MLflow or import any other additional libraries before running Evidently? What is the Jupyter environment you run it in (e.g. Jupyter notebook, Databricks notebook, AWS Sagemaker notebook)?
Hi, @elenasamuylova I am using Azure databricks -DBR 12.2LTSML, Spark3.3.2,Scala2.12
I am not using mlflow in my notebook but some additional libraries like-json,spark ,functions ,datetime modules are there in my notebook
I think I solved the mystery - it seems that mlflow
uses typing annotations and wrong annotation was cached by @lru_cache
of typing.List
which in turn broke our code. Details are here https://github.com/pydantic/pydantic/issues/7022
@sameeryadav @master-pro can you install from this PR and confirm that the problem is solved? https://github.com/evidentlyai/evidently/pull/712
Hi @sameeryadav, @master-pro , the fix is now in the new Evidently version (0.4.1.) Could you check if this solves the issue for you?
Hi, I've encountered the same issue and for me, upgrade to 0.4.1 worked
Thanks for sharing @anh-le-profinit!
Hi @sameeryadav, could you confirm if you also use MLflow or import any other additional libraries before running Evidently? What is the Jupyter environment you run it in (e.g. Jupyter notebook, Databricks notebook, AWS Sagemaker notebook)?
It worked for me ,Thankyou So much guys @mike0sv & @elenasamuylova @ We can close this issue now