CIIR / Proteus

Million Book Project
8 stars 5 forks source link

Spelling error in the parsed title #74

Closed mhjang closed 9 years ago

mhjang commented 9 years ago

I found a paper whose title is "Efcient Keyword Extractionfor Meaningful Document Perception". In the original paper there was no any spelling error such as in "eficient". I don't know how this discrepancy has caused when I assumed that the titles were rather parsed but not typed manually... maybe there was an error at the time of the papers being parsed and the ACM library changed it afterwards?

I'm just leaving a note in case there is a bug in the pipeline since I don't know the details of the system.

mzarozinski commented 9 years ago

This appears to be an issue caused by the pstotext step (https://github.com/CIIR/rexa1-pstotext), it "read" the title as: