reproduce the official tutorial

giacbrd / ShallowLearn

An experiment about re-implementing supervised learning models based on shallow neural network approaches (e.g. fastText) with some additional exclusive features and nice API. Written in Python and fully compatible with Scikit-learn.

GNU Lesser General Public License v3.0

198 stars 30 forks source link

reproduce the official tutorial #20

Open giacbrd opened 7 years ago

giacbrd commented 7 years ago

a fastText tutorial has been published: https://github.com/facebookresearch/fastText/blob/master/tutorials/supervised-learning.md

Do a Jupyter Notebook!

prakhar2b commented 7 years ago

@giacbrd I would love to do this. We need to reproduce the results for ShallowLearn, right? (for all the features that it supports as of now)

giacbrd commented 7 years ago

Hi, I was already writing a tutorial notebook for ShallowLearn (using Wikipedia data), something to post around the web. Now that there is a "official" tutorial I would like to change mine in order to use their data and comparing the results. So, thank you but I think to do this the next days!

giacbrd commented 7 years ago

@prakhar2b I am still working on this and I found some inconsistencies between the cython and python version of the algorithm. The cython version produces strange outputs sometimes, so I am working on fixing this. In short, the cython code is unstable!

prakhar2b commented 7 years ago

@giacbrd Thanks for the update. Is there any inconsistency in the Cython file in Labeled w2v (PR#1153 in gensim) as well ?

Working on labeled w2v is in the pipeline for July as a part of my Google summer of code project with Gensim. If cython code is unstable, I will have to write cython code from scratch for my gsoc project, I would appreciate if you could guide me there too. Thanks :smile:

giacbrd commented 7 years ago

Yes there is a bug with the "softmax" loss function, the code in the PR is the same. I am going to fix this ASAP.

For "unstable" I meant "there's a bug"! The cython code is mostly iterations over arrays, it should be easy to refactor, but maybe there is not so much to improve in terms of speed.

Thanks for the acknowledgment!

mdeland commented 7 years ago

Just curious - has this bug been fixed? What impact would it have?

giacbrd commented 7 years ago

Hi, the method by softmax outputs was not working properly. Actually, it was a problem due to parameters configuration. I chenged some stuff and the develop branch is updated (see https://github.com/giacbrd/ShallowLearn/compare/develop#diff-8e5218dd47d140f1c094b54c6f9d1290). I have still not released the fix