Let the sentence "Some unannotated example text with a hashtag #HashtagExample" be the input for core_parse_conll(). If it would really be generated by bignlp this would look like:
x <- "1\tSome\t_\t_\t_\t_\t_\n2\tunannotated\t_\t_\t_\t_\t_\n3\texample\t_\t_\t_\t_\t_\n4\ttext\t_\t_\t_\t_\t_\n5\twith\t_\t_\t_\t_\t_\n6\ta\t_\t_\t_\t_\t_\n7\thashtag\t_\t_\t_\t_\t_\n8\t#HashtagExample\t_\t_\t_\t_\t_\n\n"
This will cause trouble because the number of columns gets mixed up because the "#HashtagExample" is treated like a comment instead of an ordinary character vector.
Problem
If there is a literal hashtag ("#") in the input of
corenlp_parse_conll()
,read.table
will treat it as a comment.https://github.com/PolMine/bignlp/blob/872ff58c489c994c28395d4347deef6376915245/R/output.R#L117
Example
Let the sentence "Some unannotated example text with a hashtag #HashtagExample" be the input for
core_parse_conll()
. If it would really be generated bybignlp
this would look like:This will cause trouble because the number of columns gets mixed up because the "#HashtagExample" is treated like a comment instead of an ordinary character vector.
Potential Solution
I guess, the solution is to turn off the comment.char altogether in
read.table()