Open bstewart opened 5 years ago
Hello, I'm having an error when using textProcessor with the metadata option (I guess my problem is related to what you describe, so I'll post it here instead of opening a new issue).
The error:
Error in `[.data.table`(metadata, , i) :
j (the 2nd argument inside [...]) is a single symbol but column name 'i' is not found. Perhaps you intended DT[, ..i]. This difference to data.frame is deliberate and explained in FAQ 1.1.
My metadata data frame has 11000 rows and 10 columns, one of them with abstracts, around 500 characters each.
I confirm that creating an index works:
my_metadata$index <- row.names(my_metadata)
index <- data.frame("index"=my_metadata$index)
texts <-
textProcessor(
my_metadata$clean_text,
metadata = index,
stem = F,
ucp = T,
striphtml = T )
texts$meta <- left_join(texts$meta, my_metadata, by = "index")
Regards
@gortegasolis Maybe this is too late. Turns out it throws the error if your metadata is in data.table format. I changed it into data.frame using as.data.frame and now it works fine.
There is a strange error that pops up in textProcessor when copying very long metadata fields. Not entirely sure why or how to stop it. Looking into it, it might be possible to simplify how metadata is handled by only maintaining a document index in the metadata.
The case we need to test against is one where the entire document is dropped before creating the document term matrix,