Closed ReinhardSellmair closed 6 months ago
Hey @ReinhardSellmair , thanks for submitting your report!
This looks like a local issue, since it didn't trip any tests in the automated build and I also couldn't reproduce this on a fresh installation.
Are you working in a fresh environment as well, or does this occur after updating NannyML?
Would you happen to have some more logging, so we can check where the issue arises?
Thanks for your quick response. I installed nannyML version 0.10.3 to my an existing environment. Here is the full error log:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
File <command-3447662672388356>, line 20
17 calc.fit(reference_df)
18 results = calc.calculate(analysis_df)
---> 20 figure = results.filter(column_names=results.categorical_column_names, methods=['chi2']).plot(kind='distribution')
21 figure.show()
File python/lib/python3.10/site-packages/nannyml/usage_logging.py:238, in log_usage.<locals>.logging_decorator.<locals>.logging_wrapper(*args, **kwargs)
236 finally:
237 if runtime_exception is not None:
--> 238 raise runtime_exception
239 else:
240 return res
File python/lib/python3.10/site-packages/nannyml/usage_logging.py:187, in log_usage.<locals>.logging_decorator.<locals>.logging_wrapper(*args, **kwargs)
184 runtime_exception = None
185 try:
186 # run original function
--> 187 res = func(*args, **kwargs)
188 except BaseException as exc:
189 runtime_exception = exc
File python3.10/site-packages/nannyml/drift/univariate/result.py:249, in Result.plot(self, kind, *args, **kwargs)
234 return plot_metrics(
235 self,
236 title='Univariate drift metrics',
(...)
246 metric_name='Method',
247 )
248 elif kind == 'distribution':
--> 249 return plot_distributions(
250 self,
251 reference_data=self.reference_data,
252 analysis_data=self.analysis_data,
253 chunker=self.chunker,
254 )
255 else:
256 raise InvalidArgumentsException(
257 f"unknown plot kind '{kind}'. " f"Please provide on of: ['drift', 'distribution']."
258 )
File python/lib/python3.10/site-packages/nannyml/plots/blueprints/distributions.py:83, in plot_distributions(result, reference_data, analysis_data, chunker, title, figure, x_axis_time_title, x_axis_chunk_title, y_axis_title, figure_args, subplot_title_format, number_of_columns)
80 x_axis_is_time_based = is_time_based_x_axis(analysis_chunk_start_dates, analysis_chunk_end_dates)
82 if column_name in result.categorical_column_names and method in result.categorical_method_names:
---> 83 figure = _plot_stacked_bar(
84 figure=figure,
85 row=row,
86 col=col,
87 chunker=chunker,
88 column_name=column_name,
89 metric_display_name=method,
90 reference_data=reference_data[column_name],
91 reference_data_timestamps=reference_data[result.timestamp_column_name]
92 if x_axis_is_time_based
93 else None,
94 reference_alerts=reference_result.alerts(key),
95 reference_chunk_keys=reference_result.chunk_keys,
96 reference_chunk_periods=reference_result.chunk_periods,
97 reference_chunk_indices=reference_result.chunk_indices,
98 reference_chunk_start_dates=reference_result.chunk_start_dates,
99 reference_chunk_end_dates=reference_result.chunk_end_dates,
100 analysis_data=analysis_data[column_name],
101 analysis_data_timestamps=analysis_data[result.timestamp_column_name] if x_axis_is_time_based else None,
102 analysis_alerts=analysis_result.alerts(key),
103 analysis_chunk_keys=analysis_result.chunk_keys,
104 analysis_chunk_periods=analysis_result.chunk_periods,
105 analysis_chunk_indices=analysis_result.chunk_indices,
106 analysis_chunk_start_dates=analysis_chunk_start_dates,
107 analysis_chunk_end_dates=analysis_chunk_end_dates,
108 )
109 elif column_name in result.continuous_column_names and method in result.continuous_method_names:
110 figure = _plot_joyplot(
111 figure=figure,
112 row=row,
(...)
133 analysis_chunk_end_dates=analysis_chunk_end_dates,
134 )
File python/lib/python3.10/site-packages/nannyml/plots/blueprints/distributions.py:285, in _plot_stacked_bar(figure, column_name, metric_display_name, reference_data, reference_data_timestamps, analysis_data, analysis_data_timestamps, chunker, reference_alerts, reference_chunk_keys, reference_chunk_periods, reference_chunk_indices, reference_chunk_start_dates, reference_chunk_end_dates, analysis_alerts, analysis_chunk_keys, analysis_chunk_periods, analysis_chunk_indices, analysis_chunk_start_dates, analysis_chunk_end_dates, row, col, hover)
276 if has_reference_results:
277 reference_value_counts = calculate_value_counts(
278 data=reference_data,
279 chunker=chunker,
(...)
282 missing_category_label='Missing',
283 )
--> 285 figure = stacked_bar(
286 figure=figure,
287 stacked_bar_table=reference_value_counts,
288 color=Colors.BLUE_SKY_CRAYOLA,
289 chunk_indices=reference_chunk_indices,
290 chunk_start_dates=reference_chunk_start_dates,
291 chunk_end_dates=reference_chunk_end_dates,
292 annotation='Reference',
293 showlegend=True,
294 legendgrouptitle_text=f'<b>{column_name}</b>',
295 legendgroup=column_name,
296 subplot_args=subplot_args,
297 )
299 assert reference_chunk_indices is not None
300 analysis_chunk_indices = analysis_chunk_indices + (max(reference_chunk_indices) + 1)
File python/lib/python3.10/site-packages/nannyml/plots/components/stacked_bar_plot.py:143, in stacked_bar(figure, stacked_bar_table, color, chunk_start_dates, chunk_end_dates, chunk_indices, subplot_args, annotation, **kwargs)
131 hover.add(data['value_counts_normalised'], name='value_counts_normalised')
132 hover.add(data['value_counts'], name='value_counts')
134 figure.add_trace(
135 Bar(
136 name=category,
137 x=x,
138 y=data['value_counts_normalised'],
139 orientation='v',
140 marker=dict(line_color=color, color=category_colors_transparent[i], line_width=0),
141 yperiodalignment="start",
142 offset=0,
--> 143 customdata=hover.get_custom_data(),
144 hovertemplate=hover.get_template(),
145 hoverlabel=dict(bgcolor=category_colors_transparent[i], font=dict(color='white')),
146 **kwargs,
147 ),
148 **subplot_args,
149 )
151 # Shade chunk type
152 x0 = chunk_start_dates.min() if is_time_based_x_axis(chunk_start_dates, chunk_end_dates) else chunk_indices.min()
File python3.10/site-packages/nannyml/plots/components/hover.py:60, in Hover.get_custom_data(self)
57 if not isinstance(self.custom_data[0], (List, np.ndarray)):
58 return np.asarray([self.custom_data, self.custom_data])
---> 60 return np.stack(self.custom_data, axis=-1)
File python3.10/site-packages/numpy/core/shape_base.py:449, in stack(arrays, axis, out, dtype, casting)
447 shapes = {arr.shape for arr in arrays}
448 if len(shapes) != 1:
--> 449 raise ValueError('all input arrays must have the same shape')
451 result_ndim = arrays[0].ndim + 1
452 axis = normalize_axis_index(axis, result_ndim)
ValueError: all input arrays must have the same shape```
Hi @ReinhardSellmair, I'm also unable to reproduce this using NannyML 0.10.3 in my environment. There's probably a different dependency version somewhere that is causing this.
Would you be able to share the output of pip freeze
in the environment where you're seeing this issue?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I'm experiencing exactly the same problem:
This code
figure = results.filter(column_names=results.categorical_column_names, methods=['jensen_shannon']).plot(kind='distribution')
leads to this error coming from nannyml Hover, get_custom_data(), and it's call to np.stack result in this message
ValueError: all input arrays must have the same shape
My versions:
{'nannyml': '0.12.1', 'pandas': '2.2.3', 'polars': '0.20.31', 'pyarrow': '14.0.2', 'numpy': '1.24.4'}
python:3.10.12
Btw, maybe for the debugging purposes (and better communication with our users) we could implement .show_version() and ask for output from it in the "report a bug"-type of New Issue?
EDIT: Fixed it with using the right environment : {'nannyml': '0.12.1', 'pandas': '1.5.3', , 'pyarrow': '14.0.2', 'numpy': '1.24.4'}, pandas >2 could be the culprit
Describe the bug A ValueError is thrown when plotting distribution of categorical feature.
To Reproduce I'm using version 0.10.3 Running following code:
raises following error: ValueError: all input arrays must have the same shape