OpenRefine / OpenRefine

OpenRefine is a free, open source power tool for working with messy data and improving it
https://openrefine.org/
BSD 3-Clause "New" or "Revised" License
10.81k stars 1.95k forks source link

wikibase: Error when parsing date from a cell #6774

Closed wetneb closed 1 month ago

wetneb commented 1 month ago

The following exception was thrown (and reported in the server logs) when evaluating Wiikibase edits on a project:

java.lang.NumberFormatException: For input string: ""
    at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67)
    at java.base/java.lang.Long.parseLong(Long.java:721)
    at java.base/java.lang.Long.parseLong(Long.java:836)
    at java.base/java.text.DigitList.getLong(DigitList.java:195)
    at java.base/java.text.DecimalFormat.parse(DecimalFormat.java:2197)
    at java.base/java.text.SimpleDateFormat.subParse(SimpleDateFormat.java:1937)
    at java.base/java.text.SimpleDateFormat.parse(SimpleDateFormat.java:1545)
    at org.openrefine.wikibase.schema.WbDateConstant.parse(WbDateConstant.java:140)
    at org.openrefine.wikibase.schema.WbDateVariable.fromCell(WbDateVariable.java:63)
    at org.openrefine.wikibase.schema.WbDateVariable.fromCell(WbDateVariable.java:44)
    at org.openrefine.wikibase.schema.WbVariableExpr.evaluate(WbVariableExpr.java:113)
    at org.openrefine.wikibase.schema.WbSnakExpr.evaluate(WbSnakExpr.java:95)
    at org.openrefine.wikibase.schema.WbStatementExpr.evaluate(WbStatementExpr.java:176)
    at org.openrefine.wikibase.schema.WbStatementGroupExpr.evaluate(WbStatementGroupExpr.java:103)
    at org.openrefine.wikibase.schema.WbItemEditExpr.evaluate(WbItemEditExpr.java:124)
    at org.openrefine.wikibase.schema.WbItemEditExpr.evaluate(WbItemEditExpr.java:56)
    at org.openrefine.wikibase.schema.WikibaseSchema.evaluateEntityDocuments(WikibaseSchema.java:175)
    at org.openrefine.wikibase.schema.WikibaseSchema$EvaluatingRowVisitor.visit(WikibaseSchema.java:235)
    at com.google.refine.browsing.RowVisitor.visit(RowVisitor.java:84)
    at com.google.refine.browsing.util.ConjunctiveFilteredRows.visitRow(ConjunctiveFilteredRows.java:77)
    at com.google.refine.browsing.util.ConjunctiveFilteredRows.accept(ConjunctiveFilteredRows.java:66)
    at org.openrefine.wikibase.schema.WikibaseSchema.evaluate(WikibaseSchema.java:204)
    at org.openrefine.wikibase.commands.PreviewWikibaseSchemaCommand.doPost(PreviewWikibaseSchemaCommand.java:122)
    at com.google.refine.RefineServlet.service(RefineServlet.java:187)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:750)
    at org.eclipse.jetty.servlet.ServletHolder$NotAsync.service(ServletHolder.java:1410)
    at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:764)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:529)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:131)
    at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:578)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:223)
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1570)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:131)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
    at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:790)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:223)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1384)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:176)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:484)
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1543)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:174)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1306)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:129)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
    at com.google.refine.ValidateHostHandler.handle(ValidateHostHandler.java:93)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
    at org.eclipse.jetty.server.Server.handle(Server.java:563)
    at org.eclipse.jetty.server.HttpChannel$RequestDispatchable.dispatch(HttpChannel.java:1598)
    at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:753)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:501)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:282)
    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:314)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:100)
    at org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:840)

Project in which this happened: ORCID-researchers-without-affiliation.openrefine.tar.gz

This seems to be caused by SimpleDateFormat.parse throwing aNumberFormatException, which is unexpected according to the Javadocs of this method: https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html#parse-java.lang.String-java.text.ParsePosition-

This method is supposed to return null if parsing failed. So this could be a bug in the JDK. In the meantime, we could of course catch such exceptions on our side.

tfmorris commented 1 month ago

SimpleDateFormat isn't thread safe. Any chance it's being called from multiple threads? https://stackoverflow.com/questions/21017502/numberformatexception-while-parsing-date-with-simpledateformat-parse

Perhaps it'd be worth switching to the thread safe DateTimeFormatter?

wetneb commented 1 month ago

Ah that's an excellent point, I didn't notice that at all… That must definitely be the source of the problem, because it is indeed possible that multiple requests evaluate edits concurrently.