BrandonSmithJ / MDN

Mixture Density Network for water constituent estimation
GNU General Public License v3.0
39 stars 34 forks source link

MDN #13

Open james-d-h opened 2 years ago

james-d-h commented 2 years ago

Hi Brandon,

What methods are there to precompute the weights for something like this – is it possible to just pass the targets through an MDN as an input initially, to get good starting weights for the priors/means/variances?

Also for full covariance, does care need to be given to which targets you regress together?

James

james-d-h commented 2 years ago

Error bars still large for _get_avg_confidence() – attached in the diagram. (I reduced their size by 50 times to see the plot). The documentation there says these errors do get large given rho=sqrt(2)*erfinv(p ** (1/d)) though. Not sure you can comment?

The line which kept returning errors was 437: avg_sigma = tf.reduce_sum(tf.expand_dims(tf.expand_dims(prior, -1), -1) * (sigma + tf.matmul(tf.transpose(mu - tf.expand_dims(avg_estim, 1), (0,2,1)), mu - tf.expand_dims(avg_estim, 1))), axis=1)

which I changed to: avg_sigma = tf.reduce_sum(tf.expand_dims(tf.expand_dims(prior, -1), -1) * ...(sigma + tf.expand_dims(tf.matmul(tf.transpose(mu - tf.expand_dims(avg_estim, 1), (0,2,1)), ...mu -tf.expand_dims(avg_estim, 1)), 1)), axis=1)

image
BrandonSmithJ commented 1 year ago

Hey James,

Sorry that this slipped through the cracks, and I'm only now getting back to you.

To answer your first comment: if I understand your question correctly, you're essentially asking if you could do something like an unsupervised pretraining - albeit with targets instead of inputs. This isn't possible with the current implementation, but I could imagine there being a way to more intelligently initialize the weights. Unfortunately, a naïve approach based on unsupervised pretraining would require you to remove the input layer when going from target->target (as is done normally with the output layer when pretraining with input->input) - and I'm unsure the benefit would persist when the input layer is then scrambling the initial layer activations compared to what subsequent layers expect. It's certainly an interesting idea to explore, though.

As for multiple targets, yes, it could theoretically be the case that certain combinations of targets could interfere with each other and cause worse outcomes overall. While I don't have any evidence to back this up at the moment, my intuition suggests that this situation is unlikely in practice however, and you would need to artificially generate an adversarial dataset to observe that. I could certainly be wrong though; either way it's another interesting research avenue.

As to your second comment, I would actually suggest using the confidence_interval parameter of MDN.extract_predictions (which you can pass to the MDN.predict function) - this will give you confidence bounds appropriate to the prediction being generated (average vs top). Unless you're explicitly using averaged predictions (avg_est=True), _get_top_confidence is the function you'd want to use.

Regardless of that though, the confidence intervals being provided with those methods are untested at best right now. Certainly for more than a couple targets, you'll likely get wildly inaccurate intervals due to the curse of dimensionality. Our group is working on another confidence method though, which should provide a much more practical uncertainty for a given prediction. The paper on that is currently under review, and should be published before the end of the year I'm guessing.