Open carlosparadis opened 7 years ago
Create a workbook for a boxplot for the entire data.
The X axis should be the house id. The Y axis should be a sensor type (e.g. relative humidity, or temperature). We should create one boxplot for every type of sensor to identify outliers. Send a screenshot of the plot to slack tableau channel referring this issue link rather than post here.
@kathrynparadis
I just got word from Eileen that the second requested timestamp (see in the bottom the two clear sections of the requests is suspected to contain outliners in July and August. You should double check with her the precise time frame and use the time range to showcase the boxplots you have been working on for this issue, hopefully as a dashboard.
Please post here once you get the precise time range.
We decided to create boxplots for each PurposeID by "type" (power, luminosity, humidity, temperature), and create a Dashboard containing the 4 plots for each house.
This is the most current version:
I am also trying to work on creating a time-series version of this to display one month at a time that is easily changeable, which is important because the original boxplot shows all of the data at once:
However, I'm having trouble getting one filter to apply independently to different dashboard on the same file (the building ID filter on image one also connects with the second image, meaning I can't look at 2 different buildings on separate dashboards at the same time. It will change both when I change one.)
I will post an update once I figure that out.
@eileenpeppard @ryantanaka @jygh98
Contrary to the missing data plot, this is an outlier plot. It is supposed to help us pick abnormal values. The main inspiration for this plot is the infamous egauge e792, and in particular this comment: https://github.com/erdl/legacy-scrape-util/issues/15#issuecomment-342301029
We didn't realize there was something wrong with the PV always being 0 in this eGauge until 2 months later.
In the dashboard above, this could have been easily spotted in the bottom-right corner of Power
when shuffling through the houses (the box would be squeezed in 0 forever, while the box you see there varies between 0 and 5 because of day and night cycle).
I wish I could show the boxplot of egaugee792 so you would actually see, but at the time not only we had an error in it, but also had a lock on it preventing to be accessed from the url, hence missing data, and therefore a problem for #6 workbook to solve. Although in this case, we would also notice the absence of the PV column if all data would be missing (but not just some).
Notice the plots are intended to don't require another table. In Power, both appliances and purpose id are included.
@kathrynparadis will be adding the room type for the purpose ids of the other plots in the dashboard (temperature, light, and humidity).
@kathrynparadis p.s.: Remember to change the multiple choice box to a mutually exclusive choice box (aka radio buttons), as multiple choice here makes no sense.
Here's an updated picture including the room types, and single choice box:
Still working on the time-series issue.
So here are my initial impressions of the plot
What egauge are these plots monitoring? I do see the purpose id above each column, but if i had to check what egauge these ids are associated with i would need to go onto the server and look at the config file.
What are the time frame for these plots? If we are trying to look for outliers i think it would be important to label the time ranges for these plots.
What do the color bars represent as compared to the error bars? If i am trying to look for outliers for example would that mean nearly all of the data in the Power plot for dryer be outliers? Mainly what i am exactly trying to look?
Also would it be possible to make the labels horizontal for the x-axis? Otherwise i think overall it looks pretty solid.
Two-line plots, one for min and one for max for every sensor should be created. Use the dhhl database to prototype using the
readings
table, and then test it infrog_uhm
.Since this impact our ability to see if something is wrong, this issue is of high priority.