HazyResearch / bazaar

14 stars 11 forks source link

Bazaar/Parser doesn't correctly escape some characters in TSV #20

Open netj opened 8 years ago

netj commented 8 years ago

For example, carriage returns should also be escaped properly but not, hence causing troubles like HazyResearch/deepdive#523.

I think this part of the code needs more careful work to conform to Postgres' TSV format or some other stricter standard: https://github.com/HazyResearch/bazaar/blob/c09dce20f16a90c359f804f9e83d6107547d442c/parser/src/main/scala/com/clearcut/nlp/DocumentParser.scala#L98

raphaelhoffmann commented 8 years ago

Thanks, @netj! Since bazaar/parser is redundant and doesn't add much value, I'd favor removing it and migrating downstream applications to bazaar/pipe (or CoreNLP without a wrapper). We're also using JSON for serdes of NLP annotations to avoid charset and tsv issues.

lanphan commented 8 years ago

@raphaelhoffmann I got problem with Bazaar/parser setup.sh (see HazyResearch/deepdive#530), what should I do to replace bazaar/parser by bazaar/pipe like you said? Is it possible to apply it to current Deepdive now?