Do you know why their official fasttext python wrapper cannot have a api which directly accept sentence iterator as corpus feeder?

giacbrd / ShallowLearn

An experiment about re-implementing supervised learning models based on shallow neural network approaches (e.g. fastText) with some additional exclusive features and nice API. Written in Python and fully compatible with Scikit-learn.

GNU Lesser General Public License v3.0

198 stars 30 forks source link

Do you know why their official fasttext python wrapper cannot have a api which directly accept sentence iterator as corpus feeder? #27

Open HelloMarcZ opened 5 years ago

HelloMarcZ commented 5 years ago

Your implemetion GensimFastText can use sentence iterator to feed and train fasttext model, that's Great. Because we may have very large text corpus, it's very not convenient to first convert them to a file and then train the model. Do you know why their official fasttext python wrapper cannot have a api which directly accept sentence iterator as corpus feeder?

giacbrd commented 5 years ago

Hi, yes my code is based on Gensim codebase, so for the fit method it uses the same approach to data ingestion. The original FastText has its own interface, which I suppose is limited by the original C codebase. I actually wrote this project in order to have better designed methods for this model. However, I stopped improving the code, and it is meant only for text classification, not for word embeddings. Moreover, the integration with scikit-learn may have some problem on multi-label tasks.