Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
I'm confused about the equation $\sumjc{ij}p(y_j)=\mu(\hat y_i)$ and the definition of confusion matrix $C$ above.
As I understood, the equation is based on the full probability equation $$\sum_jP(\hat y=y_i|y=y_j)P(y=y_j)=P(\hat y=y_i)$$ where $\hat{y}$ stands for the predicted label of $x$ and $y$ stands for the true label of $x$. To link the two equation together, I got $P(\hat y=y_i)$ is equal to $\mu(\hat y_i)$ and $P(y=y_j)$ is equal to $p(yj)$. So the confusion matrix element $c{ij}$ need to be the conditional probability, while according to the definition above, the $c_{ij}$ is actually a joint probability drawn from training distribution. My question is
Am I thinking wrong?
or are we using the joint probability to calculate the target label distribution approximately while never precisely?
I'm confused about the equation $\sumjc{ij}p(y_j)=\mu(\hat y_i)$ and the definition of confusion matrix $C$ above. As I understood, the equation is based on the full probability equation $$\sum_jP(\hat y=y_i|y=y_j)P(y=y_j)=P(\hat y=y_i)$$ where $\hat{y}$ stands for the predicted label of $x$ and $y$ stands for the true label of $x$. To link the two equation together, I got $P(\hat y=y_i)$ is equal to $\mu(\hat y_i)$ and $P(y=y_j)$ is equal to $p(yj)$. So the confusion matrix element $c{ij}$ need to be the conditional probability, while according to the definition above, the $c_{ij}$ is actually a joint probability drawn from training distribution. My question is
Looking forward to your reply!