Closed andrewelamb closed 1 year ago
manifest: synapse_storage_manifest.csv
Related to inRange rule.
Upon investigation, the observed_value
was missing from the result
dictionary in the variable result_dict
, but there was a dictionary value for the exception_info
key indicating that an exception was raised during the running of the expectation suite.
I've added functionality to GE_Helpers.generate_errors
to parse and raise any exceptions raised during GE validation. In this case the trace is as displayed below:
Exception has occurred: GreatExpectationsError (note: full exception trace is shown but execution is paused at: _run_module_as_main)
Traceback (most recent call last):
File "C:\Users\gjordan\anaconda3\envs\schematic\lib\site-packages\great_expectations\execution_engine\execution_engine.py", line 650, in _process_direct_and_bundled_metric_computation_configurations
] = metric_computation_configuration.metric_fn(
File "C:\Users\gjordan\anaconda3\envs\schematic\lib\site-packages\great_expectations\expectations\metrics\metric_provider.py", line 90, in inner_func
return metric_fn(*args, **kwargs)
File "C:\Users\gjordan\anaconda3\envs\schematic\lib\site-packages\great_expectations\expectations\metrics\map_metric_provider.py", line 371, in inner_func
meets_expectation_series = metric_fn(
File "C:\Users\gjordan\anaconda3\envs\schematic\lib\site-packages\great_expectations\expectations\metrics\column_map_metrics\column_values_between.py", line 205, in _pandas
return temp_column.map(is_between)
File "C:\Users\gjordan\anaconda3\envs\schematic\lib\site-packages\pandas\core\series.py", line 4539, in map
new_values = self._map_values(arg, na_action=na_action)
File "C:\Users\gjordan\anaconda3\envs\schematic\lib\site-packages\pandas\core\base.py", line 890, in _map_values
new_values = map_f(values, mapper)
File "pandas\_libs\lib.pyx", line 2924, in pandas._libs.lib.map_infer
File "C:\Users\gjordan\anaconda3\envs\schematic\lib\site-packages\great_expectations\expectations\metrics\column_map_metrics\column_values_between.py", line 141, in is_between
raise TypeError(
TypeError: Column values, min_value, and max_value must either be None or of the same type.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\gjordan\anaconda3\envs\schematic\lib\site-packages\great_expectations\validator\validation_graph.py", line 272, in _resolve
self._execution_engine.resolve_metrics(
File "C:\Users\gjordan\anaconda3\envs\schematic\lib\site-packages\great_expectations\execution_engine\execution_engine.py", line 375, in resolve_metrics
return self._process_direct_and_bundled_metric_computation_configurations(
File "C:\Users\gjordan\anaconda3\envs\schematic\lib\site-packages\great_expectations\execution_engine\execution_engine.py", line 654, in _process_direct_and_bundled_metric_computation_configurations
raise gx_exceptions.MetricResolutionError(
great_expectations.exceptions.exceptions.MetricResolutionError: Column values, min_value, and max_value must either be None or of the same type.
File "C:\Users\gjordan\Documents\GitHub\schematic\schematic\models\GE_Helpers.py", line 416, in generate_errors
raise GreatExpectationsError(result_dict['exception_info']['exception_traceback'])
File "C:\Users\gjordan\Documents\GitHub\schematic\schematic\models\validate_manifest.py", line 158, in validate_manifest_rules
errors, warnings = ge_helpers.generate_errors(
File "C:\Users\gjordan\Documents\GitHub\schematic\schematic\models\validate_manifest.py", line 253, in validate_all
manifest, vmr_errors, vmr_warnings = vm.validate_manifest_rules(manifest, sg, restrict_rules, project_scope)
File "C:\Users\gjordan\Documents\GitHub\schematic\schematic\models\metadata.py", line 254, in validateModelManifest
errors, warnings, manifest = validate_all(self, errors, warnings, manifest, manifestPath, self.sg, jsonSchema, restrict_rules, project_scope)
File "C:\Users\gjordan\Documents\GitHub\schematic\schematic\models\commands.py", line 232, in validate_manifest
errors, warnings = metadata_model.validateModelManifest(
File "C:\Users\gjordan\anaconda3\envs\schematic\Lib\site-packages\click\decorators.py", line 38, in new_func
return f(get_current_context().obj, *args, **kwargs)
File "C:\Users\gjordan\anaconda3\envs\schematic\Lib\site-packages\click\core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "C:\Users\gjordan\anaconda3\envs\schematic\Lib\site-packages\click\core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "C:\Users\gjordan\anaconda3\envs\schematic\Lib\site-packages\click\core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "C:\Users\gjordan\anaconda3\envs\schematic\Lib\site-packages\click\core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "C:\Users\gjordan\anaconda3\envs\schematic\Lib\site-packages\click\core.py", line 1055, in main
rv = self.invoke(ctx)
File "C:\Users\gjordan\anaconda3\envs\schematic\Lib\site-packages\click\core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "C:\Users\gjordan\Documents\GitHub\schematic\schematic\__main__.py", line 45, in <module>
main()
File "C:\Users\gjordan\anaconda3\envs\schematic\Lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Users\gjordan\anaconda3\envs\schematic\Lib\runpy.py", line 196, in _run_module_as_main (Current frame)
return _run_code(code, main_globals, None,
great_expectations.exceptions.exceptions.GreatExpectationsError: Traceback (most recent call last):
File "C:\Users\gjordan\anaconda3\envs\schematic\lib\site-packages\great_expectations\execution_engine\execution_engine.py", line 650, in _process_direct_and_bundled_metric_computation_configurations
] = metric_computation_configuration.metric_fn(
File "C:\Users\gjordan\anaconda3\envs\schematic\lib\site-packages\great_expectations\expectations\metrics\metric_provider.py", line 90, in inner_func
return metric_fn(*args, **kwargs)
File "C:\Users\gjordan\anaconda3\envs\schematic\lib\site-packages\great_expectations\expectations\metrics\map_metric_provider.py", line 371, in inner_func
meets_expectation_series = metric_fn(
File "C:\Users\gjordan\anaconda3\envs\schematic\lib\site-packages\great_expectations\expectations\metrics\column_map_metrics\column_values_between.py", line 205, in _pandas
return temp_column.map(is_between)
File "C:\Users\gjordan\anaconda3\envs\schematic\lib\site-packages\pandas\core\series.py", line 4539, in map
new_values = self._map_values(arg, na_action=na_action)
File "C:\Users\gjordan\anaconda3\envs\schematic\lib\site-packages\pandas\core\base.py", line 890, in _map_values
new_values = map_f(values, mapper)
File "pandas\_libs\lib.pyx", line 2924, in pandas._libs.lib.map_infer
File "C:\Users\gjordan\anaconda3\envs\schematic\lib\site-packages\great_expectations\expectations\metrics\column_map_metrics\column_values_between.py", line 141, in is_between
raise TypeError(
TypeError: Column values, min_value, and max_value must either be None or of the same type.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\gjordan\anaconda3\envs\schematic\lib\site-packages\great_expectations\validator\validation_graph.py", line 272, in _resolve
self._execution_engine.resolve_metrics(
File "C:\Users\gjordan\anaconda3\envs\schematic\lib\site-packages\great_expectations\execution_engine\execution_engine.py", line 375, in resolve_metrics
return self._process_direct_and_bundled_metric_computation_configurations(
File "C:\Users\gjordan\anaconda3\envs\schematic\lib\site-packages\great_expectations\execution_engine\execution_engine.py", line 654, in _process_direct_and_bundled_metric_computation_configurations
raise gx_exceptions.MetricResolutionError(
great_expectations.exceptions.exceptions.MetricResolutionError: Column values, min_value, and max_value must either be None or of the same type.
The issue appears to be related to they types of entries in the manifest. In the manifest provided, there are NA
values entered that get converted to empty strings during import. I believe the error is arising because there are string values and numerical values in the same column being compared to numerical values.
As part of the PR I've allowed cross-type comparisons so that this error will not be raised, but the NA values will still be counted as "out of range" and display an error or warning.
The issue appears to be related to they types of entries in the manifest. In the manifest provided, there are
NA
values entered that get converted to empty strings during import. I believe the error is arising because there are string values and numerical values in the same column being compared to numerical values.As part of the PR I've allowed cross-type comparisons so that this error will not be raised, but the NA values will still be counted as "out of range" and display an error or warning.
@andrewelamb's manifest seems to be anther use case for #980 cc'ing @MiekoHash @milen-sage to prioritize
@GiaJordan Could you elaborate on 'NA' values? Should they be stored as something else in the CSV?
In your manifest, you have some values specified as NA
for an attribute with the inRange
rule. They're converted to empty strings ""
when imported. Ideally, they wouldn't be strings they'd be numbers too but we can add support for that with #980
@GiaJordan I'm now seeing the below error. This is what you were expecting with NA's in columns with the inRange rule until #980 is addressed correct?
schematic model -c config.yml validate -mp synapse_storage_manifest.csv -dt Patients
WARNING: [2023-02-16 08:06:19] py.warnings - /home/alamb/miniconda3/lib/python3.9/inspect.py:351: FutureWarning: pandas.Float64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
value = getattr(object, key)
WARNING: [2023-02-16 08:06:19] py.warnings - /home/alamb/miniconda3/lib/python3.9/inspect.py:351: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
value = getattr(object, key)
WARNING: [2023-02-16 08:06:19] py.warnings - /home/alamb/miniconda3/lib/python3.9/inspect.py:351: FutureWarning: pandas.UInt64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
value = getattr(object, key)
Starting schematic...
The (model > input > location) argument with value '../iAtlasSchema/iatlas_schema.jsonld' is being read from the config file.
The (model > input > file_type) argument with value 'local' is being read from the config file.
JSON schema successfully generated from schema.org schema!
JSON schema file log stored as ../iAtlasSchema/iatlas_schema.Patients.schema.json
FileDataContext loading zep config
GxConfig.parse_yaml() failed with errors - [{'loc': ('xdatasources',), 'msg': 'field required', 'type': 'value_error.missing'}]
GxConfig.parse_yaml() returning empty `xdatasources`
Loading 'datasources' ->
{}
Loaded 'datasources' ->
{}
EphemeralDataContext has not implemented `_load_zep_config()` returning empty `GxConfig`
Loading 'datasources' ->
{}
Loaded 'datasources' ->
{}
5 expectation(s) included in expectation_suite.
Calculating Metrics: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 36/36 [00:00<00:00, 630.39it/s]
warning: On row 95 the attribute age_at_diagnosis does not contain the proper value type int.
error: age_at_diagnosis values in rows [95] are out of the specified range.
[[[95], 'age_at_diagnosis', 'age_at_diagnosis values in rows [95] are out of the specified range.', {''}]]
@andrewelamb yes, the error: age_at_diagnosis values in rows [95] are out of the specified range.
error is expected. The other warning should be addressed as well
Describe the bug
schematic model -c ../schematic/config.yml validate -mp synapse_storage_manifest.csv -dt Patients
Causes error below schemaExpected behavior Either for the manifest to validate or clearly describe what is wrong with the manifest
Priority (select one)