Open ArnoVel opened 5 years ago
Hi, The code is actually inspired from the author's code actually ; the convolutions are actually the embedding layers (they are not Linear layers), check on the end of p.4 : 'φ(xij,yij)is the mean embedding of the empirical distribution 1/mi∑δ(xij,yij)'. But I do understand that it might be a bit cryptic, and I might be also be wrong ; we've tested a lot of (MLP-based) configurations and this one seems to be the one that actually learns to classify Your other remark is correct; I should add the composite output to the model in the next release!
Best regards, Diviyan
Hi again, Thanks for the detailed answer. My original question concerning point 1 could be updated as follows: why is the embedding network/ embedding layer better framed as Conv1D's and not the 'basic' combination of linear + nonlinearity? Is this from experiments, or is this also from the paper verbatim?
I do understand the idea of modelling the Kernel Mean Embedding's function through a MLP; it is the specific way of building that MLP which I don't understand!
Thanks again, Arno V.
Hi again, I do understand your issue; I actually had the same issue, before convincing myself that it was the correct thing. I'll get into it next week, and I'll get back to you !
Hi, Would it be possible to have some kind of update? Thanks!
Hi again, I'm reading the NCC code more carefully now, and I noticed something odd. You made the choice to use
criterion = th.nn.BCEWithLogitsLoss()
and therefore your network only output logits
self.conv = th.nn.Sequential(th.nn.Conv1d(2, n_hiddens, kernel_size),
th.nn.ReLU(),
th.nn.Conv1d(n_hiddens, n_hiddens,
kernel_size),
th.nn.ReLU())
# self.batch_norm = th.nn.BatchNorm1d(n_hiddens, affine=False)
self.dense = th.nn.Sequential(th.nn.Linear(n_hiddens, n_hiddens),
th.nn.ReLU(),
th.nn.Linear(n_hiddens, 1)
)
However, in the training loop, the way you compute accuracy is
acc = th.where(th.cat(output, 0).data.cpu() > .5,
th.ones(len(output)),
th.zeros(len(output))) - th.cat(labels, 0).data.cpu()
If I'm correct, this means you're thresholding the logits directly, as output
is obtained directly as
[self.model(m.t().unsqueeze(0)) for m in batch]
and therefore a list of logits... Usually we threshold the probabilities and not the logits, don't we? Just my 2 cents!
I've asked the author about it, I hope he gets back to me
Thanks, Additionally, I've been doing some experiments and it seems that learning FCM structure from clouds of points is highly nontrivial, as depending on the FCM, It occurred very often that the training accuracy (on 10k, 20k synthetic datasets from some cause, mechanism, noise distribution) could be as high as 0.97, but the validation accuracy (on some held out datasets from the same distribution, as many as 4000 datasets) would stay at random chance !!
I would be interested about the author's opinion on this fact, If you ever get the chance.
Hey, Is there any update on this issue?
Hi, I haven't gotten any response from the original author. I will dig into the algorithm next month to try close this issue. Feel free to make a pull request if you find a solution to the algorithm
Hi, This could be obviously due to misunderstandings, but I thought the NCC framework should be built from 2 MLPs. From my understanding, MLPs typically do not use convolutions. However, the RCC code I found in CDT is the following
In addition, the original paper recommends to enforce 1 - NCC(x1,x2) = NCC(x2,x1) (where [x1,x2] is a n-sample of pairs. They do this by having a composite output
.5*(1- NCC(x2,x1) + NCC(x1,x2))
I am not sure which lines are responsible for this if they exist?Any help would be greatly appreciated. Regards, Arno V.