Open zhuhaozh opened 5 years ago
@zhuhaozh Hi, Fig.1 in the paper experiments the effectiveness of Mine in higher dimensions. Mine is better than traditional method in higher dimension but it has a little error.
However, I found that estimated mutual information cannot convegence to real MI which always near 0.6.
It is a little strange that Mine does not work well in higher dimensions. I will write using my note's notation https://github.com/MasanoriYamada/Mine_pytorch/blob/master/note.pdf
The following part is the point of MINE where T is neural network. Mine optimize to Δ→0 and KL divergence is scalar Independent of P and Q dimensions. I think Deep Learning is good at estimating scalar label from high dimension.
Hi @MasanoriYamada , I'm not familiar with MI and I'm very confused with the dimension change and slight code change. Could you please kindly answer for me?
def gen_x():
return np.sign(np.random.normal(0., 1., [data_size, 10]))
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.fc1 = nn.Linear(10, H)
self.fc2 = nn.Linear(10, H)
self.fc3 = nn.Linear(H, 1)
def forward(self, x, y):
h1 = F.relu((self.fc1(x) + self.fc2(y)))
h2 = self.fc3(h1)
return h2
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.fc1 = nn.Linear(1, H)
self.fc2 = nn.Linear(1, H)
self.fc3 = nn.Linear(H * 2, 1)# change the in_channle
def forward(self, x, y):
# h1 = F.relu((self.fc1(x) + self.fc2(y))) # change this line to the following
h1 = F.relu(torch.cat((self.fc1(x), self.fc2(y)), dim=1))
h2 = self.fc3(h1)
return h2
The plot change to this (estimatied MI almost zero)
@zhuhaozh Sorry for my late reply. I confirmed the same situation. I do not know the cause and investigate(^_^)
Sorry, I'm busy, please wait a few weak.
@zhuhaozh Hi, Is there any progress on this? I am also working on this and I think the traditional MI can not calculate correct MI on more than 2-dimensional variables. You may consider other equation for that. So, real MI that you said is wrong. Maybe neural MI estimator is right...but i am not sure. If you already get muti-dimensional variable mutual information, let me know. I am still finding how to compute this.
@jeong-tae I am still not sure about how to calculate MI by traditional way. But I found the code implement by MINE's author, and I reimplemented that code for my project. As their paper reports, it can estimate MI for the multi-dimensional variable...
@zhuhaozh Sure, the MINE can compute multi-dimensional variable case as well. For traditional way, you have to use multivariate normal distribution pdf instead of uni-variate normal distribution pdf. I am not 100% sure, i am not a mathematics expert. but seems reasonable.
@jeong-tae I am still not sure about how to calculate MI by traditional way. But I found the code implement by MINE's author, and I reimplemented that code for my project. As their paper reports, it can estimate MI for the multi-dimensional variable...
Hi! So, using the code from MINE's author based on the JD-Divergence you were able to estimate MINE for multi-dimensional variables? I haven't been able to correctly approximate even the one-dimensional case with their implementation, do you have a repository to check your code? Thank you
@jeong-tae I am still not sure about how to calculate MI by traditional way. But I found the code implement by MINE's author, and I reimplemented that code for my project. As their paper reports, it can estimate MI for the multi-dimensional variable...
Hi @jeong-tae It is good news that is working for you. But I noticed when we applied it to the high dimensional data, it is hard to converge and heavily affected by the structure of the network and the learning rate even different runs. So I am confused if there is anything I did wrong. Do you mind share or give a quick reply about how did you do it or have you met the similar situation? Thanks very much.
Hi, I changed the data dimension from 1 to another shape for example, 32828, and modified the model similar to offered in MINE appendix.
However, I found that estimated mutual information cannot convegence to real MI which always near 0.6. Do you know the reason about this?