Closed nipunbatra closed 8 years ago
Hi,
Thanks for your question. I'm not sure what's the goal of splitting it into two functions. Since our algorithm is fully unsupervised, in order to assign a label (i.e., device label) to each inferred chain, our idea is to try out all possible permutations and choose the one that maximizes accuracy. Hence, the computation of the accuracy is needed as a first step to produce a label. I don't quite see how we could decouple this procedure in order to first assign a label and then compute accuracy.
Hi, I understand that you need to compute accuracy as a first step. What I meant was that there may be times one would like to compute and report other metrics as well. So, I meant that you do as you currently do and maximise accuracy to find optimal pairing of disaggregated streams to ground truth streams. Next, you also return the labels mapping disaggregated stream to ground truth. Now, any metric taking as input ground truth and paired disaggregated streams can work.
Oh, I see. This should be an easy fix. Indeed, the function that computes accuracy returns two variables: the accuracy itself, and "cadenas_ord". The latter is a matrix containing the disaggregated streams, sorted to match the ground truth. (If the current estimated number of devices is greater than the ground truth, then cadenas_ord also contains the additional "extra" devices in its last positions.) This matrix should help compute other metrics as well.
I looked at cad_ord
in the example script. It seems that these only take integer values between 0 and 4. Are these states? Or, power values?
Good catch! These were indeed the states, not power values. I've changed the main script to prevent that bug; it should produce power values now.
Thanks for the edits @franrruiz! I think the changes I'd suggested over email will make it even easier to interact with your code.
Aside: It is possible to speed up the disaggregation? Even 100 iterations take a lot of time!
Agreed, thanks for your suggestions!
Regarding speeding up the code, please note that this is research code and we didn't focus on optimizing it. The bottleneck should be the function that implements PGAS. Hence, implementing that part in C code should provide a significant speed-up. An easier alternative may be just to reduce the number of particles used for PGAS (variables param.pgas.N_PF
and param.pgas.N_PG
), which would imply faster updates but poorer mixing. Still, I feel like their value could safely be reduced by at least a factor of 3.
@franrruiz : Thanks! I'll close the issue now and open other newer issues for other doubts.
Hi, Thanks for making your code available publicly.
I have been running your code and found that the accuracy function looks at the permutations of disaggregated streams and maximises the accuracy to associate a name with the disaggregated stream. I wonder if this function can be broken down into two: