dmberry / soylent

Automatically exported from code.google.com/p/soylent
0 stars 0 forks source link

Some text is creating incorrect javascript #21

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Try this paragraph:

The web is full of "data-driven apps." Almost any e-commerce application is
a data-driven application. There's a database behind a web front end, and
middleware that talks to a number of other databases and data services
(credit card processing companies, banks, and so on). But merely using data
isn't really what we mean by "data science." A data application acquires
its value from the data itself, and creates more data as a result. It's not
just an application with data; it's a data product. Data science enables
the creation of data products.

Then try creating shortn.1.js.

Original issue reported on code.google.com by esperte...@gmail.com on 3 Jun 2010 at 7:49

GoogleCodeExporter commented 8 years ago
This appears to be a bug with Word.  The first sentence in the paragraph is 
'"', and
the second is 'Almost any e-commerce...'.

Notably, if we change out the type of quotes to Word's standard angled quotes, 
it
works fine.  This works:

The web is full of “data-driven apps.” Almost any e-commerce application is 
a
data-driven application. There's a database behind a web front end, and 
middleware
that talks to a number of other databases and data services (credit card 
processing
companies, banks, and so on). But merely using data isn't really what we mean by
"data science." A data application acquires its value from the data itself, and
creates more data as a result. It's not just an application with data; it's a 
data
product. Data science enables the creation of data products.

Original comment by esperte...@gmail.com on 3 Jun 2010 at 9:55

GoogleCodeExporter commented 8 years ago
This is worse than we thought --- using the curly quotes leads to bad 
javascript with
random #*$)*$ characters in it.

Original comment by esperte...@gmail.com on 4 Jun 2010 at 12:16

GoogleCodeExporter commented 8 years ago
This is a unicode problem.  The paragraph needs to be written out to 
shortn.1.js as
Unicode, or specifically using Encoding.UTF8 in TurKit.cs.  However, TurKit 
doesn't
seem to like Unicode-encoded files.

For example, the one-line script (attached):
print("The web – is full of “data-driven apps.” 
汎用=最大公約数幻想に訣別を。");

Gives me:
Retrying Script Evaluation: illegal character
(C:\Users\msbernst\Documents\Soylent\code\turkit\utf8.js#1)
org.mozilla.javascript.EvaluatorException: illegal character
(C:\Users\msbernst\Documents\Soylent\code\turkit\utf8.js#1)
    at
org.mozilla.javascript.DefaultErrorReporter.runtimeError(DefaultErrorReporter.ja
va:109)
    at org.mozilla.javascript.DefaultErrorReporter.error(DefaultErrorReporter.java:96)
    at org.mozilla.javascript.Parser.addError(Parser.java:146)
    at org.mozilla.javascript.TokenStream.getToken(TokenStream.java:825)
    at org.mozilla.javascript.Parser.peekToken(Parser.java:172)
    at org.mozilla.javascript.Parser.primaryExpr(Parser.java:2408)
    at org.mozilla.javascript.Parser.memberExpr(Parser.java:1955)
    at org.mozilla.javascript.Parser.unaryExpr(Parser.java:1813)
    at org.mozilla.javascript.Parser.mulExpr(Parser.java:1742)
    at org.mozilla.javascript.Parser.addExpr(Parser.java:1723)
    at org.mozilla.javascript.Parser.shiftExpr(Parser.java:1703)
    at org.mozilla.javascript.Parser.relExpr(Parser.java:1677)
    at org.mozilla.javascript.Parser.eqExpr(Parser.java:1633)
    at org.mozilla.javascript.Parser.bitAndExpr(Parser.java:1622)
    at org.mozilla.javascript.Parser.bitXorExpr(Parser.java:1611)
    at org.mozilla.javascript.Parser.bitOrExpr(Parser.java:1600)
    at org.mozilla.javascript.Parser.andExpr(Parser.java:1588)
    at org.mozilla.javascript.Parser.orExpr(Parser.java:1576)
    at org.mozilla.javascript.Parser.condExpr(Parser.java:1559)
    at org.mozilla.javascript.Parser.assignExpr(Parser.java:1544)
    at org.mozilla.javascript.Parser.expr(Parser.java:1523)
    at org.mozilla.javascript.Parser.statementHelper(Parser.java:1202)
    at org.mozilla.javascript.Parser.statement(Parser.java:707)
    at org.mozilla.javascript.Parser.parse(Parser.java:401)
    at org.mozilla.javascript.Parser.parse(Parser.java:359)
    at org.mozilla.javascript.Context.compileImpl(Context.java:2370)
    at org.mozilla.javascript.Context.compileReader(Context.java:1321)
    at org.mozilla.javascript.Context.compileReader(Context.java:1293)
    at org.mozilla.javascript.Context.evaluateReader(Context.java:1132)
    at edu.mit.csail.uid.turkit.RhinoUtil$2.func(RhinoUtil.java:108)
    at edu.mit.csail.uid.turkit.RhinoUtil.evaluate(RhinoUtil.java:72)
    at edu.mit.csail.uid.turkit.RhinoUtil.evaluateFile(RhinoUtil.java:106)
    at edu.mit.csail.uid.turkit.TurKit.runOnce(TurKit.java:252)
    at edu.mit.csail.uid.turkit.TurKit.runOnce(TurKit.java:287)
    at edu.mit.csail.uid.turkit.gui.Main.onRun(Main.java:584)
    at edu.mit.csail.uid.turkit.gui.Main.onEvent(Main.java:537)
    at
edu.mit.csail.uid.turkit.gui.SimpleEventManager.fireEvent(SimpleEventManager.jav
a:30)
    at
edu.mit.csail.uid.turkit.gui.SimpleEventManager.fireEvent(SimpleEventManager.jav
a:24)
    at edu.mit.csail.uid.turkit.gui.Main$6.actionPerformed(Main.java:131)
    at javax.swing.AbstractButton.fireActionPerformed(Unknown Source)
    at javax.swing.AbstractButton$Handler.actionPerformed(Unknown Source)
    at javax.swing.DefaultButtonModel.fireActionPerformed(Unknown Source)
    at javax.swing.DefaultButtonModel.setPressed(Unknown Source)
    at javax.swing.plaf.basic.BasicButtonListener.mouseReleased(Unknown Source)
    at java.awt.AWTEventMulticaster.mouseReleased(Unknown Source)
    at java.awt.Component.processMouseEvent(Unknown Source)
    at javax.swing.JComponent.processMouseEvent(Unknown Source)
    at java.awt.Component.processEvent(Unknown Source)
    at java.awt.Container.processEvent(Unknown Source)
    at java.awt.Component.dispatchEventImpl(Unknown Source)
    at java.awt.Container.dispatchEventImpl(Unknown Source)
    at java.awt.Component.dispatchEvent(Unknown Source)
    at java.awt.LightweightDispatcher.retargetMouseEvent(Unknown Source)
    at java.awt.LightweightDispatcher.processMouseEvent(Unknown Source)
    at java.awt.LightweightDispatcher.dispatchEvent(Unknown Source)
    at java.awt.Container.dispatchEventImpl(Unknown Source)
    at java.awt.Window.dispatchEventImpl(Unknown Source)
    at java.awt.Component.dispatchEvent(Unknown Source)
    at java.awt.EventQueue.dispatchEvent(Unknown Source)
    at java.awt.EventDispatchThread.pumpOneEventForFilters(Unknown Source)
    at java.awt.EventDispatchThread.pumpEventsForFilter(Unknown Source)
    at java.awt.EventDispatchThread.pumpEventsForHierarchy(Unknown Source)
    at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
    at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
    at java.awt.EventDispatchThread.run(Unknown Source)

Original comment by esperte...@gmail.com on 4 Jun 2010 at 6:55

GoogleCodeExporter commented 8 years ago
Attaching utf8.js

Original comment by esperte...@gmail.com on 4 Jun 2010 at 6:55

Attachments:

GoogleCodeExporter commented 8 years ago
We may also need to change the S3 webpages to read in the <meta> tag:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

Original comment by esperte...@gmail.com on 4 Jun 2010 at 6:58

GoogleCodeExporter commented 8 years ago
Fixed in r9bafa3f88c1b. Recompiled TurKit to suppose UTF-8 scripts.

Original comment by esperte...@gmail.com on 9 Jun 2010 at 8:01