PolMine / bignlp

Tools to process large corpora line-by-line and in parallel mode
1 stars 1 forks source link

Parsing conll output breaks if input/output includes "#" #33

Open ablaette opened 3 years ago

ablaette commented 3 years ago

The # sign is interpreted as a comment and causes an error here:

read.table(text = x, blank.lines.skip = TRUE, header = FALSE, sep = "\t", quote = "")

The obvious solution is to add comment.char = ""

ablaette commented 2 years ago

Has been implemented where we use read.table() for in-memory processing. Should also be introduced for data.table::fread() for reading conll from temporary output?!