data-lessons / library-shell-DEPRECATED

Unix shell lesson for librarians NOW MOVED > https://github.com/LibraryCarpentry/lc-shell
https://github.com/LibraryCarpentry/lc-shell
Other
9 stars 19 forks source link

NER demo problem (demo via OpenRefine instead?) #8

Closed ccronje closed 8 years ago

ccronje commented 8 years ago

I came across the following error (using Mac OS X) running sed 's/\/O / /g' < gulliver_ner.txt > gulliver_ner-clean.txt

"sed: RE error: illegal byte sequence"

Did others come across this problem? I think I need to use a binary sed but not sure.

Also, just wondering if it might be easier to demo NER in OpenRefine instead, with the NER extension? (see https://github.com/data-lessons/library-openrefine/issues/2)

weaverbel commented 8 years ago

@ccronje Clinton is going to have a look

jt14den commented 8 years ago

Guys, I used the NER demo in the LC two weeks back at UCSD and the sed command ran clean. The whole NER demo worked and I definitely got positive feedback from it as a demo of what CLI can be used for. I'm wondering if the error above was because of corrupted data fixed with 34ad39344a60273febd1234885cbf8bb71a985d6?

drjwbaker commented 8 years ago

The error above is separate from the NER demo though (it is the cleaning up text bit). Either way, a clear http://www.gutenberg.org/files/829/829-0.txt should fix it. I've replaced the file. @ccronje: can you replicate the error?

drjwbaker commented 8 years ago

@jt14den: on the NER demo, I know it is a leap from the rest of the lesson, by I put it there precisely for the reason you describe: it articulates further uses of the shell.