Closed ccronje closed 8 years ago
@ccronje Clinton is going to have a look
Guys, I used the NER demo in the LC two weeks back at UCSD and the sed command ran clean. The whole NER demo worked and I definitely got positive feedback from it as a demo of what CLI can be used for. I'm wondering if the error above was because of corrupted data fixed with 34ad39344a60273febd1234885cbf8bb71a985d6?
The error above is separate from the NER demo though (it is the cleaning up text bit). Either way, a clear http://www.gutenberg.org/files/829/829-0.txt should fix it. I've replaced the file. @ccronje: can you replicate the error?
@jt14den: on the NER demo, I know it is a leap from the rest of the lesson, by I put it there precisely for the reason you describe: it articulates further uses of the shell.
I came across the following error (using Mac OS X) running
sed 's/\/O / /g' < gulliver_ner.txt > gulliver_ner-clean.txt
"sed: RE error: illegal byte sequence"
Did others come across this problem? I think I need to use a binary sed but not sure.
Also, just wondering if it might be easier to demo NER in OpenRefine instead, with the NER extension? (see https://github.com/data-lessons/library-openrefine/issues/2)