MHKiT-Software / MHKiT-Python

MHKiT-Python provides the marine renewable energy (MRE) community tools for data processing, visualization, quality control, resource assessment, and device performance.
https://mhkit-software.github.io/MHKiT/
BSD 3-Clause "New" or "Revised" License
50 stars 45 forks source link

QC Module "Check for Data Outside Expected Range" Error issue #90

Closed Rjan821163 closed 3 years ago

Rjan821163 commented 3 years ago

Hi,

I am using the qc.check_range function, and after I define my expected bounds, I type:

"results = qc.check_range(results['cleaned_data'], expected_bounds)" to run the expected range quality control test.

However I get this error: "TypeError: '<' not supported between instances of 'str' and 'float' "

I was successful in all other qc tests before using this function. How should I fix this issue? Thank you!

rpauly18 commented 3 years ago

Hi @Rjan821163, can you provide a more detailed error print out? Does it provide a line number for where the error is being thrown? Also, can you provide the line of code where you define your expected_bounds? Thanks!

Rjan821163 commented 3 years ago

Hi, thank you for your reply! I copied the code below. After I successfully complete the timestamp and corrupt data qc tests, I define the expected bounds and try to get the results. Then I get this error...

Check for Data Outside Expected Range

expected_bounds=[0,18] results=qc.check_range(results['cleaned_data'],expected_bounds) Traceback (most recent call last): File "", line 1, in File "/Users/RachelAn/opt/anaconda3/envs/MHKiT/lib/python3.8/site-packages/pecos/monitoring.py", line 746, in check_range pm.check_range(bound, key, min_failures) File "/Users/RachelAn/opt/anaconda3/envs/MHKiT/lib/python3.8/site-packages/pecos/monitoring.py", line 412, in check_range self._generate_test_results(df, bound, min_failures, error_prefix) File "/Users/RachelAn/opt/anaconda3/envs/MHKiT/lib/python3.8/site-packages/pecos/monitoring.py", line 125, in _generate_test_results mask = (df < bound[0]) File "/Users/RachelAn/opt/anaconda3/envs/MHKiT/lib/python3.8/site-packages/pandas/core/ops/init.py", line 704, in f new_data = dispatch_to_series(self, other, op, axis=axis) File "/Users/RachelAn/opt/anaconda3/envs/MHKiT/lib/python3.8/site-packages/pandas/core/ops/init.py", line 265, in dispatch_to_series bm = left._mgr.apply(array_op, right=right) File "/Users/RachelAn/opt/anaconda3/envs/MHKiT/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 404, in apply applied = b.apply(f, kwargs) File "/Users/RachelAn/opt/anaconda3/envs/MHKiT/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 346, in apply result = func(self.values, kwargs) File "/Users/RachelAn/opt/anaconda3/envs/MHKiT/lib/python3.8/site-packages/pandas/core/ops/array_ops.py", line 244, in comparison_op res_values = comp_method_OBJECT_ARRAY(op, lvalues, rvalues) File "/Users/RachelAn/opt/anaconda3/envs/MHKiT/lib/python3.8/site-packages/pandas/core/ops/array_ops.py", line 56, in comp_method_OBJECT_ARRAY result = libops.scalar_compare(x.ravel(), y, op) File "pandas/_libs/ops.pyx", line 103, in pandas._libs.ops.scalar_compare TypeError: '<' not supported between instances of 'str' and 'int'

rpauly18 commented 3 years ago

Is there any data in your dataset that would be non-numeric- like strings?

Rjan821163 commented 3 years ago

The only non-numeric data are the titles of each column. Would they affect the analysis?

print(results['cleaned_data'].head()) Time (YYYY-MM-DD-HH) ... Peak period (s) 1979-01-01 00:00:00 1979-01-01-00 ... 6.3657 1979-01-01 01:00:00 1979-05-31-00 ... 6.0119 1979-01-01 02:00:00 1979-10-28-00 ... 10.9205 1979-01-01 03:00:00 1980-03-26-00 ... 10.6359 1979-01-01 04:00:00 1980-08-23-00 ... 5.9550

rpauly18 commented 3 years ago

Based on what I can see of your data it looks like you have a 2nd time column- presumably end time. If that is a column in your dataset, and not part of the index, that is likely the cause of your problem. Can you try removing that column from your data for this QC test?

Rjan821163 commented 3 years ago

Thank you! I removed the column, and the qc.check_range worked.

I am also running into an error when I check for stagnant data (code below). Does the error mean that one of bounds needs to be "None"?

expected_bound = [0, 15]

window=10800 File "", line 1 window=10800 ^ SyntaxError: invalid syntax expected_bound = [0,15] window=10800 results=qc.check_delta(results['cleaned_data'],expected_bound,window) Traceback (most recent call last): File "", line 1, in File "/Users/RachelAn/opt/anaconda3/envs/MHKiT/lib/python3.8/site-packages/pecos/monitoring.py", line 769, in check_delta pm.check_delta(bound, key, window, direction, min_failures) File "/Users/RachelAn/opt/anaconda3/envs/MHKiT/lib/python3.8/site-packages/pecos/monitoring.py", line 504, in check_delta assert isinstance(key, (NoneType, str)), 'key must be None or of type string' AssertionError: key must be None or of type string

ssolson commented 3 years ago

Hello @Rjan821163! Thank you for your interest in MHKiT. In order to properly assist you, I need to ask that you create a minimum workable example(MWE) of the error you are experiencing so that the team here can help you best. There are lots of resources that you can find online about creating MWEs. This one is a good starting place: https://stackoverflow.com/help/minimal-reproducible-example

As we have solved the original issue and to keep issues specific to the title for future users searching the issues logs I am going to close this issue. Please feel free to open a new issue once you have MWE for us to review. Thank you again for your interest in MHKiT and we look forward to hearing from you in the future.