NCAR / NEON-visualization

Repository to include all neon-related visualization scripts.
GNU Affero General Public License v3.0
13 stars 8 forks source link

mask out bad data from NEON eval files #4

Open wwieder opened 2 years ago

wwieder commented 2 years ago

To mask out absurd measurements from NEON data @ddurden recommended using these min and max thresholds that are used in Ameriflux data processing.

@negin513 , it's not urgent but can you bring these thresholds into scripts that plot up NEON observations?

Flags used for Ameriflux data Rng$Min <- data.frame( "FC" = -100, #[umol m-2 s-1] "SC" = -100, #[umol m-2 s-1] "NEE" = -100, #[umol m-2 s-1 "LE" = -500, #[W m-2] "H" = -500, #[W m-2] "USTAR" = 0, #[m s-1] "CO2" = 200, #[umol mol-1] "H2O" = 0, #[mmol mol-1] "WS_1_1_1" = 0, #[m s-1] "WS_MAX_1_1_1" = 0, #[m s-1] "WD_1_1_1" = -0.1, #[deg] "T_SONIC" = -55.0, #[C] )

Rng$Max <- data.frame( "FC" = 100, #[umol m-2 s-1] "SC" = 100, #[umol m-2 s-1] "NEE" = 100, #[umol m-2 s-1] "LE" = 1000, #[W m-2] "H" = 1000, #[W m-2] "USTAR" = 5, #[m s-1] "CO2" = 800, #[umol mol-1] "H2O" = 100, #[mmol mol-1] "WS_1_1_1" = 50, #[m s-1] "WS_MAX_1_1_1" = 50, #[m s-1] "WD_1_1_1" = 360, #[deg] "T_SONIC" = 45.0, #[C] )

wwieder commented 2 years ago

@negin513 not critical, but did you ever try applying these masks to the plots of NEON data?

negin513 commented 2 years ago

Thanks @wwieder for the reminder. I actually did not see this before. I will work on applying these filters. I am wondering what would be the best way to do this. I think we eventually want these filters for both Bokeh and matplotlib plots so maybe writing a function remove_outliers (or something like that) and call it during pre-processing makes the most sense.

negin513 commented 2 years ago

What I originally had in mind for filtering the outlier was using std instead of fixed values. I am not sure which method (using fixed values for each variable vs. using automatic outlier detection methods) works better and it is easier.

For automatic outlier detection, there are other options available as well:

An example of using one-class classification for outlier detection: https://blogs.sap.com/2020/12/29/outlier-detection-with-one-class-classification-using-python-machine-learning-client-for-sap-hana/

wwieder commented 2 years ago

I like the function to remove_outliers. At this stage I'd keep it simple and really obvious what we're doing. Using fixed values or the 3 sigma threshold will hopefully catch the bulk of the crazy spikes in the measurements.