antonroman / smart_meter_data_analysis

This repository contains all the code developed to analyze the smart meter data with HTM and LSTM
2 stars 0 forks source link

Plot showing hourly meter load for the time-series duration #4

Closed antonroman closed 3 years ago

antonroman commented 3 years ago

A plot like this could be useful to identify different consumption patterns:

image

If could we get this graph for a random sample of 30 meters? Open to discuss better approaches.

antonroman commented 3 years ago

Please, use the 10 CSV files with aggregated S02 values to generate the plot, instead of 30 random meters.

gbarreiro commented 3 years ago

Two questions:

  1. Which variable should I plot in the Z axis? R1, R2, R3, R4 or the average or sum of all?
  2. For each date, we have data from a myriad of meters. What should I do: for each date and time, plot the sum of all the meters or the average?
antonroman commented 3 years ago

O 12:54, Ven, 4 xuñ 2021, Guillermo Barreiro @.***> escribiu:

Two questions:

  1. Which variable should I plot in the Z axis? R1, R2, R3, R4 or the average or sum of all?

Just R1, the real power.

  1. For each date, we have data from a myriad of meters. What should I do: for each date and time, plot the sum of all the meters or the average?

The sum of them. Thanks!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/antonroman/smart_meter_data_analysis/issues/4#issuecomment-854649177, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6UKYK6H37SYCVURTKAEH3TRC5IPANCNFSM45U24YTA .

gbarreiro commented 3 years ago

Before creating the graph for the aggregated meters, I've tried to create a graph like the one you suggested, but just with the data from a natural year (2020 in this example) from a random meter:

Screen Shot 2021-06-08 at 15 44 07

The graph looks quite accurate, since the consumption peak is in the middle of the X-axis, that means, in summer, when the AC is most used in the south of Spain. I'm not that sure about the accuracy of the Y-axis (the hour), do you think is alright?

antonroman commented 3 years ago

It looks great @gbarreiro good job,it is very meaningful :-) We should expect this pattern to be repeated for the most of individual users (not in the schools for example) and also for the aggregated values.

gbarreiro commented 3 years ago

Just for the sake of curiosity, and since the required effort was minimum with the code used to plot the graph below, I've made another interesting graph, where we plot the total power consumption in a natural year, but plotting the day of the week (0 = Monday, 6 = Sunday) on the X-axis, instead the day of the year:

Screen Shot 2021-06-09 at 10 01 43

I think that a graph like this one could be also very useful to identify consumption patterns, so maybe we could generate both types of charts for each aggregation.

antonroman commented 3 years ago

Yes,

It would be great. Good job :-)

On Wed, Jun 9, 2021 at 10:05 AM Guillermo Barreiro @.***> wrote:

Just for the sake of curiosity, and since the required effort was minimum with the code used to plot the graph below, I've made another interesting graph, where we plot the total power consumption in a natural year, but plotting the day of the week (0 = Monday, 6 = Sunday) on the X-axis, instead the day of the year:

[image: Screen Shot 2021-06-09 at 10 01 43] https://user-images.githubusercontent.com/33569982/121316818-feb38a80-c909-11eb-855a-952d16d3a53f.png

I think that a graph like this one could be also very useful to identify consumption patterns, so maybe we could generate both types of charts for each aggregation.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/antonroman/smart_meter_data_analysis/issues/4#issuecomment-857481408, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6UKYKNPNCX4VQJISNCXSLTR4OEBANCNFSM45U24YTA .

gbarreiro commented 3 years ago

I have made the two types of plots (weekly and yearly consumption) for the aggregations between 10% and 90% (inclusive). You can see them in the data_processing/consumption_plots folder. Unfortunately, there are some outliers that have spoilt the graphs: after 60% in the weekly plots and after 20% in the yearly ones (becoming much more severe after 60%).

I could maybe try to remove those outliers, but I don't know if it will be computationally possible, due to the high volume of data we're processing, especially with the highest aggregation levels. I'll experiment with this along this week.

gbarreiro commented 3 years ago

I have created a script called search_outliers.py that will save in a CSV file called outliers_S02.csv all the outliers detected in the S02 records. With this file, the aggregate_reactive_values.py script will exclude the meters that have at least one outlier from the aggregations. Therefore, the generated plots don't have outliers anymore and are valid.

Since the process of looking for outliers is really slow with Pandas (it took my computer ~8h), I have created a script for Spark (search_outliers_spark.py). Aggregating the values with Pandas can also be very slow, so I have also created a script for Spark (aggregate_reactive_values_spark.py)