Open HelloMarcZ opened 5 years ago
Hi, yes my code is based on Gensim codebase, so for the fit
method it uses the same approach to data ingestion. The original FastText has its own interface, which I suppose is limited by the original C codebase. I actually wrote this project in order to have better designed methods for this model. However, I stopped improving the code, and it is meant only for text classification, not for word embeddings. Moreover, the integration with scikit-learn may have some problem on multi-label tasks.
Your implemetion GensimFastText can use sentence iterator to feed and train fasttext model, that's Great. Because we may have very large text corpus, it's very not convenient to first convert them to a file and then train the model. Do you know why their official fasttext python wrapper cannot have a api which directly accept sentence iterator as corpus feeder?