Closed ehthiede closed 6 years ago
Incidentally, if someone is wondering what I'm talking about re: kwargs, check out the new visualization code (I just learned how to do this today myself).
That's a great idea! We should do it this way.
And I think yes, we should keep 'metric' and 'metric_params' separate from this.
So that raises one question: If no kwargs list is passed by the user, should we use sklearn's defaults then or should we override them with our own defaults? So far, we've used a different default for k (the default for k in sklearn is k=5, ours is k=64.). The argument in favour of the sklearn default is that it's simpler and we can shift the blame to them, or the user, for wrong parameter choices. The argument against it is that k=5 will very rarely be a good choice for diffusion map applications.
I created a new branch https://github.com/DiffusionMapsAcademics/pyDiffMap/tree/dev_kwargs
for this. However, there is a problem. When the Neigh() function in kernel.fit(...) is called, I get this error:
/Users/ralfbanisch/Documents/git/pyDiffMap/src/pydiffmap/kernel.py in fit(self, X) 59 kwargs = self.neighbor_params 60 self.neigh = NearestNeighbors(metric=self.metric, ---> 61 metric_params=self.metric_params, 62 **kwargs) 63 self.neigh.fit(X)
NameError: global name 'neighbor_params' is not defined
The notebook swiss_roll reproduces this error, if people want to check it out.
I'm not so sure if I like it anyway, as it makes the interface clunkier to handle...
Hmm... Maybe the number of neighbors should be something that is set explicitly, like metric? That would make it less clunkier. I feel like the number of nearest neighbors is something that can require a little fiddling, and because kwargs is a little clunky, maybe we should just use it for obscure things, like nearest-neighbor selection method, ball-tree radius, etc.
In fact, maybe the general rule should be that we only use it when the parameters are very obscure, or when we have a ton of pass-through options and know way of knowing what the user will want to change ahead of time.
-Erik
Also, the code in Swiss roll works for me. Weird.
-Erik
True, that seems to be a problem with my python installation on my macbook, which is fucked up now because I did pip install --uprade /pathtopydiffmap.
It does create problems further down the line though, latest when we call
K = self.neigh.kneighbors_graph(Y, **self.neighbor_params, mode='distance')
which does seem to ask for a k parameter specifically.
Still not getting those bugs. Weird.
In any case, let's make k an explicit parameter. It sounds like that's best all around.
Are you sure? The Swiss roll notebook runs also if you uncomment the fit_transform(...) line?
Didn't try that... Will try it once I'm in the office.
Sent from my ASUS
-------- Original Message -------- From:Ralf Banisch notifications@github.com Sent:Tue, 28 Nov 2017 09:28:22 -0600 To:DiffusionMapsAcademics/pyDiffMap pyDiffMap@noreply.github.com Cc:Erik Henning Thiede ehthiede@gmail.com,Author author@noreply.github.com Subject:Re: [DiffusionMapsAcademics/pyDiffMap] Use kwargs-style method for passing parameters, e.g. for Nearest Neighbors in Kernel Class (#9)
Are you sure? The Swiss roll notebook runs also if you uncomment the fit_transform(...) line?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/DiffusionMapsAcademics/pyDiffMap","title":"DiffusionMapsAcademics/pyDiffMap","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/DiffusionMapsAcademics/pyDiffMap"}},"updates":{"snippets":[{"icon":"PERSON","message":"@ralfbanisch in #9: Are you sure? The Swiss roll notebook runs also if you uncomment the fit_transform(...) line?"}],"action":{"name":"View Issue","url":"https://github.com/DiffusionMapsAcademics/pyDiffMap/issues/9#issuecomment-347559419"}}}
I get a bug that says:
AttributeError: 'Kernel' object has no attribute 'k0'
But I think that's a coding problem.
My suspicion is that it's going to work if we remove the 'n_neighbors = self.k0' argument from
K = self.neigh.kneighbors_graph(Y, n_neighbors=self.k0, mode='distance')
since the default for n_neighbors has already been set for the neigh object during initialisation. That should work (and it should probably be done in any case also in the main code). The question remains if we want it that way - it is a bit clunkier but also more flexible. I suppose it depends how often the user will want to change the k parameter. I ended up changing it all the time since the 'optimal' k really does seem to scale with the number of data points and depend on the specific data set.
Incidentally, that annoyed me enough that I put a radius search option in a separate development branch.
I think we should have a separate k argument: I think optimal k depends strongly on the dimensionality, particularly for fixed bandwidth methods (variable bandwidth methods avoid this problem somewhat.)
-Erik
ok, so separate k argument and the rest gets passed on as a keyword list? I think that sounds good, let me change that and push it.
making that tweak I mentioned above indeed fixes the problem btw.
Ya... let's say separate k, metric, and metric_params.
pushed, this should work now.
The build #106 fails on travis because all the kernel tests fail. They get unexpected keywords that they don't like.
Specifically, it's the 'n_algorithm' keyword, which is now supposed to be a kwargs list. Hm, not sure how to fix that in interaction with
@pytest.mark.parametrize('n_algorithm', ['auto', 'ball_tree'])
Maybe we could replace it with some parameterization over the neighbor_params?
e.g.
@pytest.mark.parameterize('neighbor_params',[ {'algorithm': 'auto'}, {'algorithm': 'ball_tree'} ]
?
Or we could delete it all together, and just create a separate method for testing the kwargs.
Let me try it.
not happy:
tests/test_kernel.py:13: in
the issue is now completely fixed. See commit 1652cc80fddb101adbc9ecb9d42af62e88556289
I'm going to close this since it's fixed now.
Rather then naming a few of the technical parameters explicitly, e.g. n_neighbors, we should instead pass a dictionary of arguments that gets handed to the NearestNeighbors object. This will give much greater flexibility in customizing the neighborlist, while still maintaining a reasonable level of abstraction.
A point of discussion: should we fold 'metric' and 'metric_params' into this? Theoretically one could, but I feel like this is hiding too much for the user, and we should have 'metric' as an explicit parameter for transparency.