Closed 0x26res closed 2 years ago
const websocket = perspective.websocket(
"ws://localhost:8081/websocket"
);
const table = websocket.open_table("table_name");
viewer.load(table);
This code is telling the viewer
frontend component to use the Table
from Python, e.g. to not even instantiate the engine on the client side. Since there is no engine, there is no capability to read Arrow data. The UI uses JSON/Javascript data serialization e.g. when you scroll up and down in the viewport and data needs to be fetched to render to the screen, and when the engine is in Python, it must be further serialized as stringified-json across WebSocket, and Python cannot serialize NaN
without a special (IIRC global?) handler.
You can avoid this and get Arrow encoding on the wire, by passing the virtual server-side table
to a client-side table
constructor, which will decode using the client side engine as in the "Data Binding" section of the docs:
const websocket = perspective.websocket(
"ws://localhost:8081/websocket"
);
const worker = perspective.worker();
const server_table = await websocket.open_table("table_name");
// Get a view with no params
const server_view = await server_table.view();
// Construct a table on the client side that replicates this view - it will
// read from the server with `to_arrow()`
const table = worker.table(server_view);
// Load client
viewer.load(table);
This could be made more developer friendly for sure, and there may be a way to use arrow via a separate wasm decoder without the engine itself in the future (wasm doesn't make dynamic module loading painless atm).
However - while this should "work", Perspective in general does not handle NaN
"correctly". We made a decision early on to try to replace these with None
/null
in the host language, so while the above will not crash Python, it may not return what you expect if you're explicitly calculating NaN
results.
@texodus first of all thanks for this very detailed answer, it's very helpful.
I gave it a try, and it solved the issue.
But with this set up I've noticed the a small change of behaviour. The table in the UI doesn't take into consideration the index set in the server. So whenever a record updates it gets appended. Does it mean with that set up I need to specify the index column in the frontend as well?
One small difference is that I had to specify the index in the frontend
Here's an updated reproducible example (with an index):
import logging
import threading
import tornado.ioloop
import tornado.web
from perspective import PerspectiveManager, PerspectiveTornadoHandler
import perspective
import pyarrow as pa
import numpy as np
INDEX = """
<!DOCTYPE html>
<html>
<head>
<meta
name="viewport"
content="width=device-width, initial-scale=1, maximum-scale=1, minimum-scale=1, user-scalable=no"
/>
<script src="https://cdn.jsdelivr.net/npm/@finos/perspective-viewer@1.6.5/dist/umd/perspective-viewer.js"></script>
<script src="https://cdn.jsdelivr.net/npm/@finos/perspective-viewer-datagrid@1.6.5/dist/umd/perspective-viewer-datagrid.js"></script>
<script src="https://cdn.jsdelivr.net/npm/@finos/perspective-viewer-d3fc@1.6.5/dist/umd/perspective-viewer-d3fc.js"></script>
<script src="https://cdn.jsdelivr.net/npm/@finos/perspective@1.6.5/dist/umd/perspective.js"></script>
<link
rel="stylesheet"
crossorigin="anonymous"
href="https://cdn.jsdelivr.net/npm/@finos/perspective-viewer@1.6.5/dist/umd/themes.css"
/>
<style>
perspective-viewer {
position: absolute;
top: 0;
left: 0;
right: 0;
bottom: 0;
}
</style>
</head>
<body>
<perspective-viewer id="viewer" ,> </perspective-viewer>
<script>
window.addEventListener("DOMContentLoaded", async function () {
const websocket = perspective.websocket(
"ws://localhost:8081/websocket"
);
const worker = perspective.worker();
const server_table = await websocket.open_table("table_name");
// Get a view with no params
const server_view = await server_table.view();
// Construct a table on the client side that replicates this view - it will
// read from the server with `to_arrow()`
const table = worker.table(server_view, { index: "key" });
// Load client
viewer.load(table);
});
</script>
</body>
</html>
"""
class MainHandler(tornado.web.RequestHandler):
_tables = None
_default_table = None
async def get(self, path: str) -> None:
await self.finish(INDEX)
def table_to_bytes(table: pa.Table) -> bytes:
with pa.BufferOutputStream() as sink:
with pa.ipc.new_stream(sink, table.schema) as writer:
for batch in table.to_batches():
writer.write_batch(batch)
return sink.getvalue().to_pybytes()
def perspective_thread(manager, table: perspective.Table, updater):
psp_loop = tornado.ioloop.IOLoop()
manager.set_loop_callback(psp_loop.add_callback)
manager.host_table("table_name", table)
callback = tornado.ioloop.PeriodicCallback(callback=updater, callback_time=1000)
callback.start()
psp_loop.start()
def bug_here():
arrow_table = pa.table(
[
pa.array(["a", "b"], pa.string()),
pa.array([1, 3], pa.float64())
],
names=["key", "value"]
)
perspective_table = perspective.Table(table_to_bytes(arrow_table), index="key")
manager = PerspectiveManager()
def updater():
update = pa.table(
[
pa.array(["c"], pa.string()),
pa.array([np.NAN], pa.float64())
],
names=["key", "value"]
)
perspective_table.update(table_to_bytes(update))
thread = threading.Thread(target=perspective_thread, args=(manager, perspective_table, updater), daemon=True)
thread.start()
app = tornado.web.Application(
[
(
r"/websocket",
PerspectiveTornadoHandler,
{"manager": manager, "check_origin": True},
),
(
r"/(.*)",
MainHandler
)
]
)
app.listen(8081)
loop = tornado.ioloop.IOLoop.current()
loop.start()
if __name__ == "__main__":
logging.info("Hosting in http://localhost:8081")
bug_here()
PS1: there's a small typo in your answer, const table = worker.table(view);
should be const table = worker.table(server_view);
PS2: it's probably worth mentioning in this doc https://perspective.finos.org/docs/server#javascript-client-1 that in this mode the transport layer is json instead of arrow.
PS3: I'll probably stick to your suggestion of replacing NaN with missing values in arrow.
You need to supply the index to the client-side table as well, something like this:
const table = worker.table(view, {index: "My Column Index"});
This should probably be inherited automatically, but there are scenarios where you want to set these differently on client and server.
Bug Report
Steps to Reproduce:
Run this self contained python script and go to http://localhost:8081
For context, this is mainly borrowed from https://github.com/finos/perspective/blob/master/examples/python-tornado-streaming/index.html
Expected Result:
The table should update and append a row with nan value on every cycle.
Actual Result:
No update happens and I see this error in python:
And this error in the web browser console:
Note: the error only happens when trying to display the table in the browser.
Environment:
Additional Context:
My understanding is that the data is sent to the browser using Arrow IPC, which should be able to pass
nan
values. I don't understand why json gets in the picture and at which level.The workaround I found so far is to replace nan doubles with missing value in arrow, but it's hard to systematise.