Closed GoogleCodeExporter closed 8 years ago
This appears to be a bug with Word. The first sentence in the paragraph is
'"', and
the second is 'Almost any e-commerce...'.
Notably, if we change out the type of quotes to Word's standard angled quotes,
it
works fine. This works:
The web is full of “data-driven apps.” Almost any e-commerce application is
a
data-driven application. There's a database behind a web front end, and
middleware
that talks to a number of other databases and data services (credit card
processing
companies, banks, and so on). But merely using data isn't really what we mean by
"data science." A data application acquires its value from the data itself, and
creates more data as a result. It's not just an application with data; it's a
data
product. Data science enables the creation of data products.
Original comment by esperte...@gmail.com
on 3 Jun 2010 at 9:55
This is worse than we thought --- using the curly quotes leads to bad
javascript with
random #*$)*$ characters in it.
Original comment by esperte...@gmail.com
on 4 Jun 2010 at 12:16
This is a unicode problem. The paragraph needs to be written out to
shortn.1.js as
Unicode, or specifically using Encoding.UTF8 in TurKit.cs. However, TurKit
doesn't
seem to like Unicode-encoded files.
For example, the one-line script (attached):
print("The web – is full of “data-driven apps.”
汎用=最大公約数幻想に訣別を。");
Gives me:
Retrying Script Evaluation: illegal character
(C:\Users\msbernst\Documents\Soylent\code\turkit\utf8.js#1)
org.mozilla.javascript.EvaluatorException: illegal character
(C:\Users\msbernst\Documents\Soylent\code\turkit\utf8.js#1)
at
org.mozilla.javascript.DefaultErrorReporter.runtimeError(DefaultErrorReporter.ja
va:109)
at org.mozilla.javascript.DefaultErrorReporter.error(DefaultErrorReporter.java:96)
at org.mozilla.javascript.Parser.addError(Parser.java:146)
at org.mozilla.javascript.TokenStream.getToken(TokenStream.java:825)
at org.mozilla.javascript.Parser.peekToken(Parser.java:172)
at org.mozilla.javascript.Parser.primaryExpr(Parser.java:2408)
at org.mozilla.javascript.Parser.memberExpr(Parser.java:1955)
at org.mozilla.javascript.Parser.unaryExpr(Parser.java:1813)
at org.mozilla.javascript.Parser.mulExpr(Parser.java:1742)
at org.mozilla.javascript.Parser.addExpr(Parser.java:1723)
at org.mozilla.javascript.Parser.shiftExpr(Parser.java:1703)
at org.mozilla.javascript.Parser.relExpr(Parser.java:1677)
at org.mozilla.javascript.Parser.eqExpr(Parser.java:1633)
at org.mozilla.javascript.Parser.bitAndExpr(Parser.java:1622)
at org.mozilla.javascript.Parser.bitXorExpr(Parser.java:1611)
at org.mozilla.javascript.Parser.bitOrExpr(Parser.java:1600)
at org.mozilla.javascript.Parser.andExpr(Parser.java:1588)
at org.mozilla.javascript.Parser.orExpr(Parser.java:1576)
at org.mozilla.javascript.Parser.condExpr(Parser.java:1559)
at org.mozilla.javascript.Parser.assignExpr(Parser.java:1544)
at org.mozilla.javascript.Parser.expr(Parser.java:1523)
at org.mozilla.javascript.Parser.statementHelper(Parser.java:1202)
at org.mozilla.javascript.Parser.statement(Parser.java:707)
at org.mozilla.javascript.Parser.parse(Parser.java:401)
at org.mozilla.javascript.Parser.parse(Parser.java:359)
at org.mozilla.javascript.Context.compileImpl(Context.java:2370)
at org.mozilla.javascript.Context.compileReader(Context.java:1321)
at org.mozilla.javascript.Context.compileReader(Context.java:1293)
at org.mozilla.javascript.Context.evaluateReader(Context.java:1132)
at edu.mit.csail.uid.turkit.RhinoUtil$2.func(RhinoUtil.java:108)
at edu.mit.csail.uid.turkit.RhinoUtil.evaluate(RhinoUtil.java:72)
at edu.mit.csail.uid.turkit.RhinoUtil.evaluateFile(RhinoUtil.java:106)
at edu.mit.csail.uid.turkit.TurKit.runOnce(TurKit.java:252)
at edu.mit.csail.uid.turkit.TurKit.runOnce(TurKit.java:287)
at edu.mit.csail.uid.turkit.gui.Main.onRun(Main.java:584)
at edu.mit.csail.uid.turkit.gui.Main.onEvent(Main.java:537)
at
edu.mit.csail.uid.turkit.gui.SimpleEventManager.fireEvent(SimpleEventManager.jav
a:30)
at
edu.mit.csail.uid.turkit.gui.SimpleEventManager.fireEvent(SimpleEventManager.jav
a:24)
at edu.mit.csail.uid.turkit.gui.Main$6.actionPerformed(Main.java:131)
at javax.swing.AbstractButton.fireActionPerformed(Unknown Source)
at javax.swing.AbstractButton$Handler.actionPerformed(Unknown Source)
at javax.swing.DefaultButtonModel.fireActionPerformed(Unknown Source)
at javax.swing.DefaultButtonModel.setPressed(Unknown Source)
at javax.swing.plaf.basic.BasicButtonListener.mouseReleased(Unknown Source)
at java.awt.AWTEventMulticaster.mouseReleased(Unknown Source)
at java.awt.Component.processMouseEvent(Unknown Source)
at javax.swing.JComponent.processMouseEvent(Unknown Source)
at java.awt.Component.processEvent(Unknown Source)
at java.awt.Container.processEvent(Unknown Source)
at java.awt.Component.dispatchEventImpl(Unknown Source)
at java.awt.Container.dispatchEventImpl(Unknown Source)
at java.awt.Component.dispatchEvent(Unknown Source)
at java.awt.LightweightDispatcher.retargetMouseEvent(Unknown Source)
at java.awt.LightweightDispatcher.processMouseEvent(Unknown Source)
at java.awt.LightweightDispatcher.dispatchEvent(Unknown Source)
at java.awt.Container.dispatchEventImpl(Unknown Source)
at java.awt.Window.dispatchEventImpl(Unknown Source)
at java.awt.Component.dispatchEvent(Unknown Source)
at java.awt.EventQueue.dispatchEvent(Unknown Source)
at java.awt.EventDispatchThread.pumpOneEventForFilters(Unknown Source)
at java.awt.EventDispatchThread.pumpEventsForFilter(Unknown Source)
at java.awt.EventDispatchThread.pumpEventsForHierarchy(Unknown Source)
at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
at java.awt.EventDispatchThread.run(Unknown Source)
Original comment by esperte...@gmail.com
on 4 Jun 2010 at 6:55
Attaching utf8.js
Original comment by esperte...@gmail.com
on 4 Jun 2010 at 6:55
Attachments:
We may also need to change the S3 webpages to read in the <meta> tag:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
Original comment by esperte...@gmail.com
on 4 Jun 2010 at 6:58
Fixed in r9bafa3f88c1b. Recompiled TurKit to suppose UTF-8 scripts.
Original comment by esperte...@gmail.com
on 9 Jun 2010 at 8:01
Original issue reported on code.google.com by
esperte...@gmail.com
on 3 Jun 2010 at 7:49