Closed antonroman closed 3 years ago
Please, use the 10 CSV files with aggregated S02 values to generate the plot, instead of 30 random meters.
Two questions:
O 12:54, Ven, 4 xuñ 2021, Guillermo Barreiro @.***> escribiu:
Two questions:
- Which variable should I plot in the Z axis? R1, R2, R3, R4 or the average or sum of all?
Just R1, the real power.
- For each date, we have data from a myriad of meters. What should I do: for each date and time, plot the sum of all the meters or the average?
The sum of them. Thanks!
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/antonroman/smart_meter_data_analysis/issues/4#issuecomment-854649177, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6UKYK6H37SYCVURTKAEH3TRC5IPANCNFSM45U24YTA .
Before creating the graph for the aggregated meters, I've tried to create a graph like the one you suggested, but just with the data from a natural year (2020 in this example) from a random meter:
The graph looks quite accurate, since the consumption peak is in the middle of the X-axis, that means, in summer, when the AC is most used in the south of Spain. I'm not that sure about the accuracy of the Y-axis (the hour), do you think is alright?
It looks great @gbarreiro good job,it is very meaningful :-) We should expect this pattern to be repeated for the most of individual users (not in the schools for example) and also for the aggregated values.
Just for the sake of curiosity, and since the required effort was minimum with the code used to plot the graph below, I've made another interesting graph, where we plot the total power consumption in a natural year, but plotting the day of the week (0 = Monday, 6 = Sunday) on the X-axis, instead the day of the year:
I think that a graph like this one could be also very useful to identify consumption patterns, so maybe we could generate both types of charts for each aggregation.
Yes,
It would be great. Good job :-)
On Wed, Jun 9, 2021 at 10:05 AM Guillermo Barreiro @.***> wrote:
Just for the sake of curiosity, and since the required effort was minimum with the code used to plot the graph below, I've made another interesting graph, where we plot the total power consumption in a natural year, but plotting the day of the week (0 = Monday, 6 = Sunday) on the X-axis, instead the day of the year:
[image: Screen Shot 2021-06-09 at 10 01 43] https://user-images.githubusercontent.com/33569982/121316818-feb38a80-c909-11eb-855a-952d16d3a53f.png
I think that a graph like this one could be also very useful to identify consumption patterns, so maybe we could generate both types of charts for each aggregation.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/antonroman/smart_meter_data_analysis/issues/4#issuecomment-857481408, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6UKYKNPNCX4VQJISNCXSLTR4OEBANCNFSM45U24YTA .
I have made the two types of plots (weekly and yearly consumption) for the aggregations between 10% and 90% (inclusive). You can see them in the data_processing/consumption_plots folder. Unfortunately, there are some outliers that have spoilt the graphs: after 60% in the weekly plots and after 20% in the yearly ones (becoming much more severe after 60%).
I could maybe try to remove those outliers, but I don't know if it will be computationally possible, due to the high volume of data we're processing, especially with the highest aggregation levels. I'll experiment with this along this week.
I have created a script called search_outliers.py
that will save in a CSV file called outliers_S02.csv
all the outliers detected in the S02 records. With this file, the aggregate_reactive_values.py
script will exclude the meters that have at least one outlier from the aggregations. Therefore, the generated plots don't have outliers anymore and are valid.
Since the process of looking for outliers is really slow with Pandas (it took my computer ~8h), I have created a script for Spark (search_outliers_spark.py
). Aggregating the values with Pandas can also be very slow, so I have also created a script for Spark (aggregate_reactive_values_spark.py
)
A plot like this could be useful to identify different consumption patterns:
If could we get this graph for a random sample of 30 meters? Open to discuss better approaches.