MHKiT-Software / MHKiT-Python

MHKiT-Python provides the marine renewable energy (MRE) community tools for data processing, visualization, quality control, resource assessment, and device performance.
https://mhkit-software.github.io/MHKiT/
BSD 3-Clause "New" or "Revised" License
50 stars 45 forks source link

QC Module - Stagnant Data Check Error #92

Closed Rjan821163 closed 3 years ago

Rjan821163 commented 3 years ago

I am running QC Module tests on hindcast data, and so far I have successfully completed the timestamp, corrupt data, and expected range tests, all of which had Empty DataFrames.

I am currently running the stagnant data test and want to check if there are any data that changes greater than 15 within a 3 hour window. Here is my input below:

expected_bound=[0,15] window=10800 results=qc.check_delta(results['cleaned_data'],expected_bound,window)

However after this line I get an AssertionError: "key must be None or of type string". After looking into what "key" is through the source code for pecos.monitoring, I found that defining the "key" is optional. So I am not sure why this error pops up. The immediate error code that follows is:

Traceback (most recent call last): File "", line 1, in File "/Users/RachelAn/opt/anaconda3/envs/MHKiT/lib/python3.8/site-packages/pecos/monitoring.py", line 769, in check_delta pm.check_delta(bound, key, window, direction, min_failures) File "/Users/RachelAn/opt/anaconda3/envs/MHKiT/lib/python3.8/site-packages/pecos/monitoring.py", line 504, in check_delta assert isinstance(key, (NoneType, str)), 'key must be None or of type string' AssertionError: key must be None or of type string

Thank you for your help!

ssolson commented 3 years ago

Hey Rjan821163 thanks for your interest in MHKiT. In order to assist you best, I would need you to provide a minimum working example of your code as requested in #91. Looking at what you did provide I believe your results['clean_data'] is not returning a Series. If this is not the issue I must request that you provide a minimum working example as previously requested in order to assist you further.

Rjan821163 commented 3 years ago

Thank you for being patient with me! I hope the code below is clearer. I keep running into an Assertion Error when I try to run the qc.check_delta function. I am not sure what to change in order to fix this issue. Thank you!

os.chdir('/Users/RachelAn/Documents') data=pd.read_csv('Pacwave2.csv') print(data.head()) Significant wave height (m) Energy period (s) Peak period (s) 0 2.2084 9.1326 6.3657 1 2.2269 9.0765 6.3556 2 2.2597 8.9651 6.3142 3 2.2858 8.8898 6.2969 4 2.2976 8.8660 6.3063 data.index=utils.index_to_datetime(data.index, origin='1979-01-01-00') print(data.head()) Significant wave height (m) Energy period (s) Peak period (s) 1979-01-01 00:00:00 2.2084 9.1326 6.3657 1979-01-01 00:00:01 2.2269 9.0765 6.3556 1979-01-01 00:00:02 2.2597 8.9651 6.3142 1979-01-01 00:00:03 2.2858 8.8898 6.2969 1979-01-01 00:00:04 2.2976 8.8660 6.3063

Check Timestamp

frequency=3600 results = qc.check_timestamp(data, frequency) print(results['cleaned_data'].head()) Significant wave height (m) Energy period (s) Peak period (s) 1979-01-01 00:00:00 2.2084 9.1326 6.3657 1979-01-01 01:00:00 2.5983 7.3404 6.0119 1979-01-01 02:00:00 4.1733 12.5130 10.9205 1979-01-01 03:00:00 3.6922 12.6348 10.6359 1979-01-01 04:00:00 2.1243 7.1407 5.9550 print(results['test_results'].T) Empty DataFrame Columns: [] Index: [Variable Name, Start Time, End Time, Timesteps, Error Flag]

Check Corrupt Data

corrupt_values = [-999] results = qc.check_corrupt(results['cleaned_data'], corrupt_values) print(results['test_results'].T) Empty DataFrame Columns: [] Index: [Variable Name, Start Time, End Time, Timesteps, Error Flag]

Check for Data Outside Expected Range

expected_bounds = [0, 18] results = qc.check_range(results['cleaned_data'], expected_bounds) print(results['test_results'].T) Empty DataFrame Columns: [] Index: [Variable Name, Start Time, End Time, Timesteps, Error Flag]

Check for Stagnant Data

expected_bound=[0,15] window=10800 results=qc.check_delta(results['cleaned_data'],expected_bound,window) Traceback (most recent call last): File "", line 1, in File "/Users/RachelAn/opt/anaconda3/envs/MHKiT/lib/python3.8/site-packages/pecos/monitoring.py", line 769, in check_delta pm.check_delta(bound, key, window, direction, min_failures) File "/Users/RachelAn/opt/anaconda3/envs/MHKiT/lib/python3.8/site-packages/pecos/monitoring.py", line 504, in check_delta assert isinstance(key, (NoneType, str)), 'key must be None or of type string' AssertionError: key must be None or of type string

ssolson commented 3 years ago

Hey Rachel I think if you created a minimum working example (e.g. create the smallest amount of code that reproduces the error thereby simplifying the problem) you might be able to determine what is going wrong. If you upload your pacwave.csv and Jupyter notebook I will take a look at it when I get a chance.

rpauly18 commented 3 years ago

Hi Rachel,

Can you try updating your Pecos package? I believe you may have an older version. You can do this with "pip install --upgrade pecos"

Rjan821163 commented 3 years ago

I updated my pecos package, but unfortunately could not get rid of the error. Also a .csv file type was not supported, so I attached a .txt file just so you can see the data I am using.

Pacwave2.txt

I tried to use Jupyter Notebook, but I kept running into "no module named 'mhkit' " errors. So I hope the code below will be enough!

The code below should be the smallest amount of code that reproduces the error. I'm not sure how to simplify it any further, so please let me know if there are still information that need to be omitted/added in order to reproduce the code. This code was typed into python on a mac using the MHKIT environment in Anaconda.

import mhkit import pandas as pd from mhkit import qc, utils data = pd.read_csv('Pacwave2.csv') data.index = utils.index_to_datetime(data.index, origin='1979-01-01-00')

Check for Stagnant Data

expected_bound=[0,15] window=10800 results=qc.check_delta(results['cleaned_data'],expected_bound,window)

After this last line, I get an assertion error: "AssertionError: key must be None or of type string"

rpauly18 commented 3 years ago

Can you send us the print out from typing "pip list" in your terminal/command window?

Rjan821163 commented 3 years ago

Here is the list that appeared after I typed "pip list"

Package Version

alabaster 0.7.12 anaconda-client 1.7.2 anaconda-navigator 1.9.12 anaconda-project 0.8.3 applaunchservices 0.2.1 appnope 0.1.0 appscript 1.1.1 argh 0.26.2 asn1crypto 1.3.0 astroid 2.4.2 astropy 4.0.1.post1 atomicwrites 1.4.0 attrs 19.3.0 autopep8 1.5.3 Babel 2.8.0 backcall 0.2.0 backports.functools-lru-cache 1.6.1 backports.shutil-get-terminal-size 1.0.0 backports.tempfile 1.0 backports.weakref 1.0.post1 beautifulsoup4 4.9.1 bitarray 1.4.0 bkcharts 0.2 bleach 3.1.5 bokeh 2.1.1 boto 2.49.0 Bottleneck 1.3.2 brotlipy 0.7.0 certifi 2020.6.20 cffi 1.14.0 chardet 3.0.4 click 7.1.2 cloudpickle 1.5.0 clyent 1.2.2 colorama 0.4.3 conda 4.9.2 conda-build 3.18.11 conda-package-handling 1.7.2 conda-verify 3.4.2 contextlib2 0.6.0.post1 cryptography 2.9.2 cycler 0.10.0 Cython 0.29.21 cytoolz 0.10.1 dask 2.20.0 decorator 4.4.2 defusedxml 0.6.0 diff-match-patch 20200713 distributed 2.20.0 docutils 0.16 entrypoints 0.3 et-xmlfile 1.0.1 fastcache 1.1.0 fatpack 0.6.2 filelock 3.0.12 flake8 3.8.3 Flask 1.1.2 fsspec 0.7.4 future 0.18.2 gevent 20.6.2 glob2 0.7 gmpy2 2.0.8 greenlet 0.4.16 h5py 2.10.0 HeapDict 1.0.1 html5lib 1.1 idna 2.10 imageio 2.9.0 imagesize 1.2.0 importlib-metadata 1.7.0 intervaltree 3.0.2 ipykernel 5.3.2 ipython 7.16.1 ipython-genutils 0.2.0 ipywidgets 7.5.1 isort 4.3.21 itsdangerous 1.1.0 jdcal 1.4.1 jedi 0.17.1 Jinja2 2.11.2 joblib 0.16.0 json5 0.9.5 jsonschema 3.2.0 jupyter 1.0.0 jupyter-client 6.1.6 jupyter-console 6.1.0 jupyter-core 4.6.3 jupyterlab 2.1.5 jupyterlab-server 1.2.0 keyring 21.2.1 kiwisolver 1.2.0 lazy-object-proxy 1.4.3 libarchive-c 2.9 llvmlite 0.33.0+1.g022ab0f locket 0.2.0 lxml 4.5.2 MarkupSafe 1.1.1 matplotlib 3.2.2 mccabe 0.6.1 mhkit 0.3.0 mistune 0.8.4 mkl-fft 1.1.0 mkl-random 1.1.1 mkl-service 2.3.0 mock 4.0.2 more-itertools 8.4.0 mpmath 1.1.0 msgpack 1.0.0 multipledispatch 0.6.0 navigator-updater 0.2.1 nbconvert 5.6.1 nbformat 5.0.7 networkx 2.4 nltk 3.5 nose 1.3.7 notebook 6.0.3 numba 0.50.1 numexpr 2.7.1 numpy 1.18.5 numpydoc 1.1.0 olefile 0.46 openpyxl 3.0.4 packaging 20.4 pandas 1.0.5 pandocfilters 1.4.2 parso 0.7.0 partd 1.1.0 path 13.1.0 pathlib2 2.3.5 pathtools 0.1.2 patsy 0.5.1 pecos 0.1.9 pep8 1.7.1 pexpect 4.8.0 pickleshare 0.7.5 Pillow 7.2.0 pip 20.1.1 pkginfo 1.5.0.1 pluggy 0.13.1 ply 3.11 prometheus-client 0.8.0 prompt-toolkit 3.0.5 psutil 5.7.0 ptyprocess 0.6.0 py 1.9.0 pycodestyle 2.6.0 pycosat 0.6.3 pycparser 2.20 pycurl 7.43.0.5 pydocstyle 5.0.2 pyflakes 2.2.0 Pygments 2.6.1 pylint 2.5.3 pyodbc 4.0.0-unsupported pyOpenSSL 19.1.0 pyparsing 2.4.7 pyrsistent 0.16.0 PySocks 1.7.1 pytest 5.4.3 python-dateutil 2.8.1 python-jsonrpc-server 0.3.4 python-language-server 0.34.1 pytz 2020.1 PyWavelets 1.1.1 PyYAML 5.3.1 pyzmq 19.0.1 QDarkStyle 2.8.1 QtAwesome 0.7.2 qtconsole 4.7.5 QtPy 1.9.0 regex 2020.6.8 requests 2.24.0 rope 0.17.0 Rtree 0.9.4 ruamel-yaml 0.15.87 scikit-image 0.16.2 scikit-learn 0.23.1 scipy 1.5.0 seaborn 0.10.1 Send2Trash 1.5.0 setuptools 49.2.0.post20200714 simplegeneric 0.8.1 singledispatch 3.4.0.3 six 1.15.0 snowballstemmer 2.0.0 sortedcollections 1.2.1 sortedcontainers 2.2.2 soupsieve 2.0.1 Sphinx 3.1.2 sphinxcontrib-applehelp 1.0.2 sphinxcontrib-devhelp 1.0.2 sphinxcontrib-htmlhelp 1.0.3 sphinxcontrib-jsmath 1.0.1 sphinxcontrib-qthelp 1.0.3 sphinxcontrib-serializinghtml 1.1.4 sphinxcontrib-websupport 1.2.3 spyder 4.1.4 spyder-kernels 1.9.2 SQLAlchemy 1.3.18 statsmodels 0.11.1 sympy 1.6.1 tables 3.6.1 tblib 1.6.0 terminado 0.8.3 testpath 0.4.4 threadpoolctl 2.1.0 toml 0.10.1 toolz 0.10.0 tornado 6.0.4 tqdm 4.47.0 traitlets 4.3.3 typing-extensions 3.7.4.2 ujson 1.35 unicodecsv 0.14.1 urllib3 1.25.9 watchdog 0.10.3 wcwidth 0.2.5 webencodings 0.5.1 Werkzeug 1.0.1 wheel 0.34.2 widgetsnbextension 3.5.1 wrapt 1.11.2 wurlitzer 2.0.1 xlrd 1.2.0 XlsxWriter 1.2.9 xlwings 0.19.5 xlwt 1.3.0 xmltodict 0.12.0 yapf 0.30.0 zict 2.0.0 zipp 3.1.0 zope.event 4.4 zope.interface 4.7.1

Thank you so much!

ssolson commented 3 years ago

I updated my pecos package, but unfortunately could not get rid of the error. Also a .csv file type was not supported, so I attached a .txt file just so you can see the data I am using.

Pacwave2.txt

I tried to use Jupyter Notebook, but I kept running into "no module named 'mhkit' " errors. So I hope the code below will be enough!

The code below should be the smallest amount of code that reproduces the error. I'm not sure how to simplify it any further, so please let me know if there are still information that need to be omitted/added in order to reproduce the code. This code was typed into python on a mac using the MHKIT environment in Anaconda.

import mhkit import pandas as pd from mhkit import qc, utils data = pd.read_csv('Pacwave2.csv') data.index = utils.index_to_datetime(data.index, origin='1979-01-01-00')

Check for Stagnant Data

expected_bound=[0,15] window=10800 results=qc.check_delta(results['cleaned_data'],expected_bound,window)

After this last line, I get an assertion error: "AssertionError: key must be None or of type string"

Rachel I made a couple modifications to your code in order to get a minimum working example. Here is the code I used:

from mhkit import qc, utils
import pandas as pd

data = pd.read_csv('Pacwave2.txt', sep='\s+')
data.index = utils.index_to_datetime(data.index, origin='1979-01-01-00')

#Check for Stagnant Data
expected_bound=[0,15]
window=10800
results=qc.check_delta(data,expected_bound,window)

For which everything worked great: image

I did change your read csv to use a separation plus ('\s+') for the delimiter and changed the header of your file so that there were only 3 headers also attached.

There does not appear to be any error here. Please let me know if I am missing the issue.

Pacwave2.txt

Rjan821163 commented 3 years ago

I copied your code line by line and even used the same Pacwave2.txt file that was attached, but kept getting the same error.

BUT good news! I was able to fix the issue by typing this instead:

Screen Shot 2021-02-12 at 4 06 36 PM

For some reason, when I defined the expected bound and window before using the check_delta function, I got strange errors. Thank you so much for your help, and I'm glad this issue is resolved!