franrruiz / iFDM

Code for the "infinite factorial dynamical model"
5 stars 4 forks source link

Output appliance labels along with power stream #1

Closed nipunbatra closed 8 years ago

nipunbatra commented 8 years ago

Hi, Thanks for making your code available publicly.

I have been running your code and found that the accuracy function looks at the permutations of disaggregated streams and maximises the accuracy to associate a name with the disaggregated stream. I wonder if this function can be broken down into two:

  1. First function outputs a name and disaggregated data stream
  2. Second function takes the output of the first, compares with the first based on label and compute the metrics.
franrruiz commented 8 years ago

Hi,

Thanks for your question. I'm not sure what's the goal of splitting it into two functions. Since our algorithm is fully unsupervised, in order to assign a label (i.e., device label) to each inferred chain, our idea is to try out all possible permutations and choose the one that maximizes accuracy. Hence, the computation of the accuracy is needed as a first step to produce a label. I don't quite see how we could decouple this procedure in order to first assign a label and then compute accuracy.

nipunbatra commented 8 years ago

Hi, I understand that you need to compute accuracy as a first step. What I meant was that there may be times one would like to compute and report other metrics as well. So, I meant that you do as you currently do and maximise accuracy to find optimal pairing of disaggregated streams to ground truth streams. Next, you also return the labels mapping disaggregated stream to ground truth. Now, any metric taking as input ground truth and paired disaggregated streams can work.

franrruiz commented 8 years ago

Oh, I see. This should be an easy fix. Indeed, the function that computes accuracy returns two variables: the accuracy itself, and "cadenas_ord". The latter is a matrix containing the disaggregated streams, sorted to match the ground truth. (If the current estimated number of devices is greater than the ground truth, then cadenas_ord also contains the additional "extra" devices in its last positions.) This matrix should help compute other metrics as well.

nipunbatra commented 8 years ago

I looked at cad_ord in the example script. It seems that these only take integer values between 0 and 4. Are these states? Or, power values?

franrruiz commented 8 years ago

Good catch! These were indeed the states, not power values. I've changed the main script to prevent that bug; it should produce power values now.

nipunbatra commented 8 years ago

Thanks for the edits @franrruiz! I think the changes I'd suggested over email will make it even easier to interact with your code.

Aside: It is possible to speed up the disaggregation? Even 100 iterations take a lot of time!

franrruiz commented 8 years ago

Agreed, thanks for your suggestions!

Regarding speeding up the code, please note that this is research code and we didn't focus on optimizing it. The bottleneck should be the function that implements PGAS. Hence, implementing that part in C code should provide a significant speed-up. An easier alternative may be just to reduce the number of particles used for PGAS (variables param.pgas.N_PF and param.pgas.N_PG), which would imply faster updates but poorer mixing. Still, I feel like their value could safely be reduced by at least a factor of 3.

nipunbatra commented 8 years ago

@franrruiz : Thanks! I'll close the issue now and open other newer issues for other doubts.