Begging for the training data

IntuitionEngineeringTeam / chars2vec

Character-based word embeddings model based on RNN for handling real world texts

Apache License 2.0

171 stars 37 forks source link

Begging for the training data #4

Open LuMelon opened 5 years ago

LuMelon commented 5 years ago

Hi, authors, I think you have done a very interesting job, but to train such a model requires a large set of similar words, How did you construct it? Or, could you provide your training data for us?

SupervisionT commented 5 years ago

Hi @LuMelon, if you are looking for English language you can find many good resources to use is as a training dataset. You may check:

And many other datasets. If you are targeting closed domain application or different languages, I would recommend to use

SynNets

Where you can build your own dataset for similar words for more than 100 languages.

skt7 commented 4 years ago

If you could elaborate on what exact problem are you solving, maybe one can help you with the specific dataset you are looking for.

phongtheha commented 4 years ago

I have the same question. What corpus did you use for training the language model? How did you construct the pairs? Did you manually construct the pairs? Or did you use a context window similar to Word2Vec using Keras' skipgrams function?

mustfkeskin commented 4 years ago

Do you have data generator from given corpus?

rjurney commented 4 years ago

In for the training corpus. Why not share it?

jjustinm4 commented 3 years ago

thank you for the model , creators . I sincerely apologize if my doubt is wrong because im new to this domain . I have seen models whenever they are called uses the "predict" method and "fit" during training but this model doesn't why is that so ? If i have a scenario to match with a particular word how do i use it as an argument for prediction ? like i have "drawing number " as the word and i need to see the similarity between "drawing reference " how to do it ? thank you so much