dselivanov / text2vec

Fast vectorization, topic modeling, distances and GloVe word embeddings in R.
http://text2vec.org
Other
852 stars 136 forks source link

How to pass arguments to tokeniser in itoken #216

Closed alanault closed 6 years ago

alanault commented 6 years ago

Firstly, thanks for the awesome package!

I've built a custom tokenizer, which has lots of options. I typically control these by passing a list object of options (read from a .yml), so it's easy to drive different configurations.

However, I can't work out how to pass options through to the tokeniser as part of the itoken?

e.g. from examples: it <- text2vec::itoken(x, tokenizer = my_tok_fun)

Assuming opts is a list of options which my_tok_fun uses, I want to pass the opts to the function, something like: it <- text2vec::itoken(x, tokenizer = my_tok_fun(x = x, opts = opts)

Is there a way to do this or similar? Currently, I'm manually defining the a wrapper around my_tok_fun with the options set, but this seems like it's not especially elegant.

Many thanks

Alan

dselivanov commented 6 years ago

R is extremely flexible functional programming language - you easily create closure of your tokenizer:

opts = read_opts()
my_tok_fun_closure = function(x) {
  my_tok_fun(x = x, opts = opts)
}
it <- text2vec::itoken(x, tokenizer = my_tok_fun_closure)

or even:

opts = read_opts()
it <- text2vec::itoken(x, tokenizer = function(x) my_tok_fun(x = x, opts = opts))
alanault commented 6 years ago

Thanks - I knew there would be an elegant solution!