WolfgangFahl / ProceedingsTitleParser

Shallow Semantic Parser to extract metadata from scientific proceedings titles
Apache License 2.0
3 stars 1 forks source link

Dealing with newlines in Proceedings titles #37

Closed WolfgangFahl closed 3 years ago

WolfgangFahl commented 3 years ago

e.g. some crossref events can not be stored in SQL database due to

SQL error unrecognized token: "'Proceedings of the 2017 Conference on Empirical Methods in Natural
" in line 1592:
    INSERT INTO "Event_crossref" VALUES(NULL,NULL,NULL,NULL,NULL,'Proceedings of the 2017 Conference on Empirical Methods in Natural

with the root cause being newlines in the proceedings title:

jq . crossref-* | grep "Proceedings of the 2017 Conference on Empirical Methods in Natural"
          "name": "Proceedings of the 2017 Conference on Empirical Methods in Natural\n          Language Processing",
          "Proceedings of the 2017 Conference on Empirical Methods in Natural\n          Language Processing"
          "name": "Proceedings of the 2017 Conference on Empirical Methods in Natural\n          Language Processing: System Demonstrations",
          "Proceedings of the 2017 Conference on Empirical Methods in Natural\n          Language Processing: System Demonstrations"
WolfgangFahl commented 3 years ago

see e.g. https://stackoverflow.com/questions/47227684/how-to-insert-a-new-line-n-character-in-sqlite?rq=1