DiffusionMapsAcademics / pyDiffMap

Library for diffusion maps
MIT License
46 stars 14 forks source link

Use kwargs-style method for passing parameters, e.g. for Nearest Neighbors in Kernel Class #9

Closed ehthiede closed 6 years ago

ehthiede commented 6 years ago

Rather then naming a few of the technical parameters explicitly, e.g. n_neighbors, we should instead pass a dictionary of arguments that gets handed to the NearestNeighbors object. This will give much greater flexibility in customizing the neighborlist, while still maintaining a reasonable level of abstraction.

A point of discussion: should we fold 'metric' and 'metric_params' into this? Theoretically one could, but I feel like this is hiding too much for the user, and we should have 'metric' as an explicit parameter for transparency.

ehthiede commented 6 years ago

Incidentally, if someone is wondering what I'm talking about re: kwargs, check out the new visualization code (I just learned how to do this today myself).

ralfbanisch commented 6 years ago

That's a great idea! We should do it this way.

ralfbanisch commented 6 years ago

And I think yes, we should keep 'metric' and 'metric_params' separate from this.

ralfbanisch commented 6 years ago

So that raises one question: If no kwargs list is passed by the user, should we use sklearn's defaults then or should we override them with our own defaults? So far, we've used a different default for k (the default for k in sklearn is k=5, ours is k=64.). The argument in favour of the sklearn default is that it's simpler and we can shift the blame to them, or the user, for wrong parameter choices. The argument against it is that k=5 will very rarely be a good choice for diffusion map applications.

ralfbanisch commented 6 years ago

I created a new branch https://github.com/DiffusionMapsAcademics/pyDiffMap/tree/dev_kwargs

for this. However, there is a problem. When the Neigh() function in kernel.fit(...) is called, I get this error:

/Users/ralfbanisch/Documents/git/pyDiffMap/src/pydiffmap/kernel.py in fit(self, X) 59 kwargs = self.neighbor_params 60 self.neigh = NearestNeighbors(metric=self.metric, ---> 61 metric_params=self.metric_params, 62 **kwargs) 63 self.neigh.fit(X)

NameError: global name 'neighbor_params' is not defined

ralfbanisch commented 6 years ago

The notebook swiss_roll reproduces this error, if people want to check it out.

ralfbanisch commented 6 years ago

I'm not so sure if I like it anyway, as it makes the interface clunkier to handle...

ehthiede commented 6 years ago

Hmm... Maybe the number of neighbors should be something that is set explicitly, like metric? That would make it less clunkier. I feel like the number of nearest neighbors is something that can require a little fiddling, and because kwargs is a little clunky, maybe we should just use it for obscure things, like nearest-neighbor selection method, ball-tree radius, etc.

In fact, maybe the general rule should be that we only use it when the parameters are very obscure, or when we have a ton of pass-through options and know way of knowing what the user will want to change ahead of time.

-Erik

ehthiede commented 6 years ago

Also, the code in Swiss roll works for me. Weird.

-Erik

ralfbanisch commented 6 years ago

True, that seems to be a problem with my python installation on my macbook, which is fucked up now because I did pip install --uprade /pathtopydiffmap.

It does create problems further down the line though, latest when we call

K = self.neigh.kneighbors_graph(Y, **self.neighbor_params, mode='distance')

which does seem to ask for a k parameter specifically.

ehthiede commented 6 years ago

Still not getting those bugs. Weird.

In any case, let's make k an explicit parameter. It sounds like that's best all around.

ralfbanisch commented 6 years ago

Are you sure? The Swiss roll notebook runs also if you uncomment the fit_transform(...) line?

ehthiede commented 6 years ago

Didn't try that... Will try it once I'm in the office.

Sent from my ASUS

-------- Original Message -------- From:Ralf Banisch notifications@github.com Sent:Tue, 28 Nov 2017 09:28:22 -0600 To:DiffusionMapsAcademics/pyDiffMap pyDiffMap@noreply.github.com Cc:Erik Henning Thiede ehthiede@gmail.com,Author author@noreply.github.com Subject:Re: [DiffusionMapsAcademics/pyDiffMap] Use kwargs-style method for passing parameters, e.g. for Nearest Neighbors in Kernel Class (#9)

Are you sure? The Swiss roll notebook runs also if you uncomment the fit_transform(...) line?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/DiffusionMapsAcademics/pyDiffMap","title":"DiffusionMapsAcademics/pyDiffMap","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/DiffusionMapsAcademics/pyDiffMap"}},"updates":{"snippets":[{"icon":"PERSON","message":"@ralfbanisch in #9: Are you sure? The Swiss roll notebook runs also if you uncomment the fit_transform(...) line?"}],"action":{"name":"View Issue","url":"https://github.com/DiffusionMapsAcademics/pyDiffMap/issues/9#issuecomment-347559419"}}}

ehthiede commented 6 years ago

I get a bug that says:

AttributeError: 'Kernel' object has no attribute 'k0'

But I think that's a coding problem.

ralfbanisch commented 6 years ago

My suspicion is that it's going to work if we remove the 'n_neighbors = self.k0' argument from

K = self.neigh.kneighbors_graph(Y, n_neighbors=self.k0, mode='distance')

since the default for n_neighbors has already been set for the neigh object during initialisation. That should work (and it should probably be done in any case also in the main code). The question remains if we want it that way - it is a bit clunkier but also more flexible. I suppose it depends how often the user will want to change the k parameter. I ended up changing it all the time since the 'optimal' k really does seem to scale with the number of data points and depend on the specific data set.

Incidentally, that annoyed me enough that I put a radius search option in a separate development branch.

ehthiede commented 6 years ago

I think we should have a separate k argument: I think optimal k depends strongly on the dimensionality, particularly for fixed bandwidth methods (variable bandwidth methods avoid this problem somewhat.)

-Erik

ralfbanisch commented 6 years ago

ok, so separate k argument and the rest gets passed on as a keyword list? I think that sounds good, let me change that and push it.

making that tweak I mentioned above indeed fixes the problem btw.

ehthiede commented 6 years ago

Ya... let's say separate k, metric, and metric_params.

ralfbanisch commented 6 years ago

pushed, this should work now.

ralfbanisch commented 6 years ago

The build #106 fails on travis because all the kernel tests fail. They get unexpected keywords that they don't like.

Specifically, it's the 'n_algorithm' keyword, which is now supposed to be a kwargs list. Hm, not sure how to fix that in interaction with

@pytest.mark.parametrize('n_algorithm', ['auto', 'ball_tree'])

ehthiede commented 6 years ago

Maybe we could replace it with some parameterization over the neighbor_params?

e.g.

@pytest.mark.parameterize('neighbor_params',[ {'algorithm': 'auto'}, {'algorithm': 'ball_tree'} ]

?

Or we could delete it all together, and just create a separate method for testing the kwargs.

ralfbanisch commented 6 years ago

Let me try it.

ralfbanisch commented 6 years ago

not happy:

tests/test_kernel.py:13: in class TestKernel(object): tests/test_kernel.py:49: in TestKernel @pytest.mark.parameterize('neighbor_params',[{'algorithm': 'auto'},{'algorithm': 'ball_tree'}]) /Applications/anaconda/lib/python3.6/site-packages/_pytest/mark.py:264: in getattr self._check(name) /Applications/anaconda/lib/python3.6/site-packages/_pytest/mark.py:279: in _check raise AttributeError("%r not a registered marker" % (name,)) E AttributeError: 'parameterize' not a registered marker

ralfbanisch commented 6 years ago

the issue is now completely fixed. See commit 1652cc80fddb101adbc9ecb9d42af62e88556289

ralfbanisch commented 6 years ago

I'm going to close this since it's fixed now.