TulipaEnergy / TulipaClustering.jl

Apache License 2.0
4 stars 4 forks source link

Add information about the "original period" corresponding to the selected representative #55

Closed DillonJ closed 1 month ago

DillonJ commented 1 month ago

Description

Hi from the Spine project @abelsiqueira @datejada and, in particular, SpineOpt.

I have written a script to run the clustering algorithm on a SpineOpt datastore and it works very nicely. I'm now looking at the implementation in SpineOpt. However, I'm struggling to make sense of the custing_result data. I can see the weights from original periods to representative periods but I can't see where it tells me the actual representative periods which have been identified. For example, if I asked for 12 representative periods, I can see how each original period maps to periods 1 through 12... but I don't see where it tells me what original period representative period 1 relates to.

Perhaps I have misunderstood something, but I would appreciate some guidance.

Thanks in advance!

datejada commented 1 month ago

Hi @DillonJ, we are working to improve the documentation so that it is more straightforward for users how to use the package. You can see one example in this Pluto notebook we are working on; please look at section 2, which shows how, from the hourly data, you can use the function split_into_periods! to group by 24 (i.e., period_duration) to get daily periods in your data. When you call the function find_representative_periods, the argument n_rp will return the number of representatives from those periods.

https://github.com/TulipaEnergy/pipeline-example/blob/main/pipeline.jl

I hope these indications help you use the package. Please stay in touch for updates that will include more improvements.

datejada commented 1 month ago

Action: Include an example of using the package from the input data and getting the results, including their interpretation.

DillonJ commented 1 month ago

Thanks for that @datejada

Say I have requested 12 representative periods, where in the result does it tell you the correspondence between the representative periods and the original periods? I.e. what original period does rp1 correspond to?

I believe rp_matrix tells us for each original period, the representative period it is represented by - but how do we get the information about the representative periods themselves and the periods of time they relate to in the original data?

Sorry if I have missed something really simple!

datejada commented 1 month ago

Oh! I see; I was the one who didn't understand at the beginning; sorry about that. We don't store that information. We print the information of each representative (i.e., time series like the availability and demand profile) and the mapping with the weights. This is because we can also cluster with k-means, which means that there is no correspondence to an original period. That correspondence only makes sense when we have k-medoids as a clustering method. We support both options.

We have a convenience function, write_clustering_result_to_tables, that exports the results to DuckDB tables with the profiles, the representative period's data and the mapping (for the Tulipa format).

https://github.com/TulipaEnergy/TulipaClustering.jl/blob/63803cc7c9f81535ec69324009a17c8b1a9a804c/src/io.jl#L24

But, this convenience function is extracting the information from the results obtained from the function find_representative_periods. So, it is possible to cluster using this function and then use the results there to export to any other format. These are the values we store in the results to create the files in Tulipa:

clustering_result = TC.find_representative_periods(tc_df, num_rep_periods)
@show clustering_result.weight_matrix
@show clustering_result.profiles
@show clustering_result.auxiliary_data

I will discuss with the team if we might add the correspondence to the original period in the clustering_result.auxiliary_data structure if the method is k-medoids. I think it is easy, and it might make sense for SpineOpt.

Please let me know if I got your question right this time or if there are even more questions after this explanation 🤣

DillonJ commented 1 month ago

OK, I see now... you are regenerating the timeseries data so that that time series data itself is representative.

In SpineOpt we don't touch the timeseries data and instead map the timeslices. It would indeed be useful to have as an output the rep-period to original period mapping so we can implement this in Spine.

Thanks again for all the help and explanations

datejada commented 1 month ago

Action 🖥️:

Add information about the "original period" corresponding to the selected representative. For example, if we have days as the period of the representatives and we want two representatives, then we would like to know which original period corresponds to each representative (e.g., representative 1 is day 23 and representative 2 is day 304 in the original data). This information will only be available if the method is k-medoids (for k-means, maybe nothing is the suitable output). Please check where is the best place to allocate this information in the clustering_result structure (maybe in the auxiliary_data information).

Happy coding!