MasanoriYamada / Mine_pytorch

MINE: Mutual Information Neural Estimation in pytorch (unofficial)
202 stars 38 forks source link

Is the mutual information estimate is wrong when change the data dimension? #2

Open zhuhaozh opened 5 years ago

zhuhaozh commented 5 years ago

Hi, I changed the data dimension from 1 to another shape for example, 32828, and modified the model similar to offered in MINE appendix.
However, I found that estimated mutual information cannot convegence to real MI which always near 0.6. Do you know the reason about this?

MasanoriYamada commented 5 years ago

@zhuhaozh Hi, Fig.1 in the paper experiments the effectiveness of Mine in higher dimensions. Mine is better than traditional method in higher dimension but it has a little error.

However, I found that estimated mutual information cannot convegence to real MI which always near 0.6.

It is a little strange that Mine does not work well in higher dimensions. I will write using my note's notation https://github.com/MasanoriYamada/Mine_pytorch/blob/master/note.pdf

The following part is the point of MINE image where image T is neural network. Mine optimize to Δ→0 and KL divergence is scalar Independent of P and Q dimensions. I think Deep Learning is good at estimating scalar label from high dimension.

zhuhaozh commented 5 years ago

Hi @MasanoriYamada , I'm not familiar with MI and I'm very confused with the dimension change and slight code change. Could you please kindly answer for me?

1. the following picture shows the plot when I only change the code to this(change x and y's dimension from 1 to 10):

def gen_x():
    return np.sign(np.random.normal(0., 1., [data_size, 10]))
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(10, H)
        self.fc2 = nn.Linear(10, H)
        self.fc3 = nn.Linear(H, 1)

    def forward(self, x, y):
        h1 = F.relu((self.fc1(x) + self.fc2(y)))
        h2 = self.fc3(h1)
        return h2

image

2. When I changed the codes slightly(dimension still 1):

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(1, H)
        self.fc2 = nn.Linear(1, H)
        self.fc3 = nn.Linear(H * 2, 1)# change the in_channle

    def forward(self, x, y):
        # h1 = F.relu((self.fc1(x) + self.fc2(y))) # change this line to the following
        h1 = F.relu(torch.cat((self.fc1(x), self.fc2(y)), dim=1))
        h2 = self.fc3(h1)
        return h2

The plot change to this (estimatied MI almost zero) image

MasanoriYamada commented 5 years ago

@zhuhaozh Sorry for my late reply. I confirmed the same situation. I do not know the cause and investigate(^_^)

Sorry, I'm busy, please wait a few weak.

jeong-tae commented 5 years ago

@zhuhaozh Hi, Is there any progress on this? I am also working on this and I think the traditional MI can not calculate correct MI on more than 2-dimensional variables. You may consider other equation for that. So, real MI that you said is wrong. Maybe neural MI estimator is right...but i am not sure. If you already get muti-dimensional variable mutual information, let me know. I am still finding how to compute this.

zhuhaozh commented 5 years ago

@jeong-tae I am still not sure about how to calculate MI by traditional way. But I found the code implement by MINE's author, and I reimplemented that code for my project. As their paper reports, it can estimate MI for the multi-dimensional variable...

jeong-tae commented 5 years ago

@zhuhaozh Sure, the MINE can compute multi-dimensional variable case as well. For traditional way, you have to use multivariate normal distribution pdf instead of uni-variate normal distribution pdf. I am not 100% sure, i am not a mathematics expert. but seems reasonable.

tiagoCuervo commented 5 years ago

@jeong-tae I am still not sure about how to calculate MI by traditional way. But I found the code implement by MINE's author, and I reimplemented that code for my project. As their paper reports, it can estimate MI for the multi-dimensional variable...

Hi! So, using the code from MINE's author based on the JD-Divergence you were able to estimate MINE for multi-dimensional variables? I haven't been able to correctly approximate even the one-dimensional case with their implementation, do you have a repository to check your code? Thank you

DorisxinDU commented 4 years ago

@jeong-tae I am still not sure about how to calculate MI by traditional way. But I found the code implement by MINE's author, and I reimplemented that code for my project. As their paper reports, it can estimate MI for the multi-dimensional variable...

Hi @jeong-tae It is good news that is working for you. But I noticed when we applied it to the high dimensional data, it is hard to converge and heavily affected by the structure of the network and the learning rate even different runs. So I am confused if there is anything I did wrong. Do you mind share or give a quick reply about how did you do it or have you met the similar situation? Thanks very much.