Hi, I am new to word2vec. I am preparing corpus in sentences using wikipedia
dump. However the dump is pre-splitted in paragraphs which seems need to
further be processed into sentences.
My question is
is it possible to directly train paragraphs instead of sentences? Or it is a
must that word2vec (the SkipGram model) has to work with sentences.
Since the algorithm trains the data by a context window, I didn't see much
difference by add the extra window across sentences within the same paragraph.
Original issue reported on code.google.com by yel...@gmail.com on 24 Feb 2015 at 9:35
Original issue reported on code.google.com by
yel...@gmail.com
on 24 Feb 2015 at 9:35