erdl / monitor

Python Notebooks and Tableau files associated to monitor if anything is wrong with data being collected.
0 stars 1 forks source link

Create a tableau workbook to identify outliers #3

Open carlosparadis opened 7 years ago

carlosparadis commented 7 years ago

Two-line plots, one for min and one for max for every sensor should be created. Use the dhhl database to prototype using the readings table, and then test it in frog_uhm.

Since this impact our ability to see if something is wrong, this issue is of high priority.

carlosparadis commented 7 years ago

Create a workbook for a boxplot for the entire data.

The X axis should be the house id. The Y axis should be a sensor type (e.g. relative humidity, or temperature). We should create one boxplot for every type of sensor to identify outliers. Send a screenshot of the plot to slack tableau channel referring this issue link rather than post here.

carlosparadis commented 7 years ago

@kathrynparadis

I just got word from Eileen that the second requested timestamp (see in the bottom the two clear sections of the requests is suspected to contain outliners in July and August. You should double check with her the precise time frame and use the time range to showcase the boxplots you have been working on for this issue, hopefully as a dashboard.

Please post here once you get the precise time range.

kathrynparadis commented 7 years ago

We decided to create boxplots for each PurposeID by "type" (power, luminosity, humidity, temperature), and create a Dashboard containing the 4 plots for each house.

kathrynparadis commented 6 years ago

This is the most current version:

outliers2

I am also trying to work on creating a time-series version of this to display one month at a time that is easily changeable, which is important because the original boxplot shows all of the data at once:

outlier_timeseries

However, I'm having trouble getting one filter to apply independently to different dashboard on the same file (the building ID filter on image one also connects with the second image, meaning I can't look at 2 different buildings on separate dashboards at the same time. It will change both when I change one.)

I will post an update once I figure that out.

carlosparadis commented 6 years ago

@eileenpeppard @ryantanaka @jygh98

Contrary to the missing data plot, this is an outlier plot. It is supposed to help us pick abnormal values. The main inspiration for this plot is the infamous egauge e792, and in particular this comment: https://github.com/erdl/legacy-scrape-util/issues/15#issuecomment-342301029

We didn't realize there was something wrong with the PV always being 0 in this eGauge until 2 months later.

In the dashboard above, this could have been easily spotted in the bottom-right corner of Power when shuffling through the houses (the box would be squeezed in 0 forever, while the box you see there varies between 0 and 5 because of day and night cycle).

I wish I could show the boxplot of egaugee792 so you would actually see, but at the time not only we had an error in it, but also had a lock on it preventing to be accessed from the url, hence missing data, and therefore a problem for #6 workbook to solve. Although in this case, we would also notice the absence of the PV column if all data would be missing (but not just some).

Notice the plots are intended to don't require another table. In Power, both appliances and purpose id are included.

Future Work

@kathrynparadis will be adding the room type for the purpose ids of the other plots in the dashboard (temperature, light, and humidity).

@kathrynparadis p.s.: Remember to change the multiple choice box to a mutually exclusive choice box (aka radio buttons), as multiple choice here makes no sense.

kathrynparadis commented 6 years ago

Here's an updated picture including the room types, and single choice box:

outliers_rooms

Still working on the time-series issue.

jygh98 commented 6 years ago

So here are my initial impressions of the plot

  1. What egauge are these plots monitoring? I do see the purpose id above each column, but if i had to check what egauge these ids are associated with i would need to go onto the server and look at the config file.

  2. What are the time frame for these plots? If we are trying to look for outliers i think it would be important to label the time ranges for these plots.

  3. What do the color bars represent as compared to the error bars? If i am trying to look for outliers for example would that mean nearly all of the data in the Power plot for dryer be outliers? Mainly what i am exactly trying to look?

Also would it be possible to make the labels horizontal for the x-axis? Otherwise i think overall it looks pretty solid.

ryantanaka commented 6 years ago