Adding a small learning module to the GSP

nperraud commented 6 years ago

Two functions for semi-supervised learning (regression + classification)
One new graph (A sensor graph that scale linearly with the number of nodes)

coveralls commented 6 years ago

Coverage increased (+0.2%) to 86.599% when pulling 8c28f7f4792ac8b2c84bf624985b493f2c2aa51f on semi-supervised-learning into 199960d6ad2c5dab7310dfc06a1e2e1a74586688 on master.

nperraud commented 6 years ago

@mdeff, could you check that I did everything correctly and accept the pull request? Thanks

mdeff commented 6 years ago

Thanks for your contribution @nperraud.

Two remarks:

Style of docstring. They have a fixed structure: e.g. Parameters, Examples. You did it correctly for the SensorLarge but not for the functions in learning.py.
We don't want to impose the use of FLANN without explicit user choice. For one, they might not have FLANN installed (it's not possible to install it with pip on Windows for example). It's better for users to not have "magic" changes of behavior, like changing from exact to approximate FLANN at arbitrary N=2000. If they find it slow, they should enable FLANN, knowing the downsides.

Thanks for making unit tests! :)

lrq3000 commented 6 years ago

Thank you for this PR @nperraud , buit would it be possible to interface pyGSP with broader machine learning modules such as scikit-learn or theano? I mean, would it require a lot of changes in pyGSP to do that, what do you guys think?

mdeff commented 6 years ago

What do you want to do? Most of the data in the pygsp is numpy arrays, which you can pass around to other libraries. For one, in my research I use TensorFlow/Theano/pyTorch to learn graph filters, which I then plug back in the pygsp for visualization and analysis.

mdeff commented 6 years ago

@lrq3000 I just discovered you were an author of tqdm. Thanks for this, I use it all the time! :)

nperraud commented 6 years ago

Interfacing with scikit-learn would probably be possible. One way for example is that the pygsp function could return a scikit-learn object.

I am more concerned about the algorithm itself since you cannot really implement:

object.fit(Xtrain)
object.predict(Xtest)

You have to do something like:

object.predict(Xtrain, Xtest)

Alternatively, the fit function could only store the training dataset and all computation would be done in the predict function.

Using a kernel (and hence a different algorithm), would solve this problem but would be way slower in general and would require significantly more coding. What do you think?

nperraud commented 6 years ago

@mdeff, so what is the way to go here?

mdeff commented 6 years ago

I was merging SensorLarge and Sensor, which really are the same graph model. Then we should be able to merge.

nperraud commented 5 years ago

I think it is time to finish this pull request. To make it simple, I will not make the .fit, .predict interface. We can always do that later on.

I will try to fix the conflicts an tests soon.

nperraud commented 5 years ago

All tests are probably passing but I still need to add the large sensor graph.

nperraud commented 5 years ago

@mdeff, Everything should be ready to merge. I did not merge sensor and sensorlarge because they still have different functionalities. Instead I renamed sensor as NNSensor. If you tell me, I will just remove the sensor and rename NNSensor as sensor, but I wait for your approval. I happy because I increased the coverage...

nperraud commented 5 years ago

@mdeff, could you tell me what is the problem with the build?

mdeff commented 5 years ago

@nperraud I had this test failure as well. I think it is due to too many open matplotlib figures. No idea why that is a problem with python 3.6 only. I fixed it (merely a hack) in fe31116, but it reappeared here. Are you creating new figures?

nperraud commented 5 years ago

@mdeff Yes, I am creating new figures in the doc of the learning module. Should we merge?

mdeff commented 5 years ago

Now that I've found the build issue I'll fix it in master before merging, ;)

mdeff commented 5 years ago

Is there no way to merge the two sensor graphs?

nperraud commented 5 years ago

Honestly, if it was only up to me, I would just drop the old sensor graph and keep only my new implementation. But that would mean that the sensor function would behave differently with the new version. I do not believe that the options available in the sensor graph are that important.

mdeff commented 5 years ago

The build issue was fixed in baf342a6aa31b54b544db87e30f518303a53f5b3 (#38).

mdeff commented 5 years ago

I much prefer your implementation and agree to drop Sensor and rename NNSensor to Sensor. The missing options regular can be implemented in NNGraph, and connected in Sensor, if somebody needs them later.

My only worry is that the weight distribution is quite different in the two implementations. Is that due to the scaling of the Gaussian, or something else? 2018-12-30-00 33 19

Any reason for k=6 by default? NNGraph sets k=10.

nperraud commented 5 years ago

Indeed the weight distribution is quite different. This is why I did not want to erase the old graph without proper approval. It is due to the number of neighbors, the fact that the new one is not a radius graph anymore and to the weighting of the Gaussian. I cannot say if one is better than the other. It will depend on the application.

k=6 was selected because I thought that a grid had 4 neighbors, so the number should be around that. Then I plotted a few graphs and 6 was kind of more beautiful IMHO. :-) 10 is a bit too connected, I think... But again, it depends on the application.

mdeff commented 5 years ago

Finally merged! Sorry for the delay... Many thanks for your work @nperraud. Especially for your efforts in unit tests and documentation. :)

I've done the following before merging:

Remove the old Sensor in favor of your new implementation.
Cleaned the learning module and its tests.

I think that learning.py and optimization.py should be merged. Maybe optimization.py is such in a poor state that it can be removed entirely...

epfl-lts2 / pygsp

Adding a small learning module to the GSP #19