Clean Nulls Option. Skip Thornton if all NaN.

drewf7 commented 4 years ago

Hi!

Was running this script against some station data from Wyomings WACNET (http://www.wrds.uwyo.edu/WACNet/Stations.html).

Included here are two proposed fixes I needed to make locally in order to complete a full run.

Fix #1 is the addition of a configuration option "clean_nulls". If set to 1 the input file is read, and all null characters (\x00) are replaced with empty strings.

An alternative solution would be to set 'engine=c' on the pandas parser (this also worked in testing). However when reading the pandas documentation at https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html they mention "The C engine is faster while the python engine is currently more feature-complete."

I don't know what all parsing features are being used. So manually removing null bytes seemed like the safer option.

Fix #2 has to do with the running of the thornton-running stuff in the "calc_org_and_opt_rs_tr" function. In my case it was failing due to every single number in the "mc_rmse" array being NaN (numpy ValueError all-NaN slice).

Feel free to tell me also if this one was just a mis-configuration on my part.

I don't really have a fix for this. But what I did was surround with try/catch so that the script can continue. This allowed me to finish generating my before_graphs. However the bottom two RH graphs were blank due to the skipped step.

Happy to make any changes if you'd like. Mostly thought I'd just offer back the things I needed to change to get this to run for me.

Thanks!

cwdunkerly commented 3 years ago

My apologies, my notifications were not set up correctly and I am just now seeing this. Thank you for taking the time and effort to do this, and please allow me a few days to review it.

cwdunkerly commented 3 years ago

I am not able to replicate your first issue with test datasets containing \x00, would you mind telling me what version of Pandas and Python you were using?

I have tested with Python 3.8.10, pandas 1.2.5.

Your second issue sounds like you may have misconfigured your dataset, or at least not used the full suite of weather variables it expects (Temperature maximum and minimum, windspeed, solar radiation, and some form of humidity variable.) Were you using a dataset that was missing one of these?

drewf7 commented 3 years ago

Hi @cwdunkerly,

No worries!

The null issue I was seeing was as of commit d45ba6d4c122f445cd94377c56903886ea38d793.

PS C:\Users\drewf_000> python --version
Python 3.6.2
PS C:\Users\drewf_000> pip show pandas
Name: pandas
Version: 0.25.1

(Seems my pandas might be way out of date)

It's definitely possible it no longer exists in a newer version of this tool/python/pandas.

I've attached the csv and .ini config I was using in my testing. I pulled the csv from http://www.wrds.uwyo.edu/WACNet/Lyman_1SW_Week.html

CSV Download: error_with_nulls.zip

RE: The second issue it's definitely possible that I just had my configuration wrong. I don't work much with python or data processing in my day job, and I was just running a test of this to help my girlfriend who worked for the state of Wyoming at the time.

Apologies if these are now non-issues. I'll happily close this request if that's the case :)

cwdunkerly commented 3 years ago

Thanks for getting back to me with that info.

I would lay the blame with your pandas version on this one, however I will update my requirements to a more recent version of pandas so that others don't experience the same issue.

I'll go ahead and close this, but I do want to say again that I sincerely appreciate you getting back to me, especially considering how long you had to wait.

WSWUP / agweather-qaqc

Clean Nulls Option. Skip Thornton if all NaN. #12