how to train Word2Vec for new dataset?

VulDetProject / ReVeal

MIT License

187 stars 63 forks source link

Open KagamiBaka opened 3 years ago

KagamiBaka commented 3 years ago

if i want to use train a new Word2Vec for a new dataset,which files should i use,in what order?

for-just-we commented 3 years ago

I have the same question, but I presume that you should tokenize a code into a token sequence and then use the sequence to train Word2Vec model

NikolasBielski commented 2 years ago

Each line becomes a 'doc' in a 'corpus'. If you use a AST from joern you can benefit from their parsing.