bluesky / databroker

Unified API pulling data from multiple sources
https://blueskyproject.io/databroker
BSD 3-Clause "New" or "Revised" License
33 stars 45 forks source link

Improve JSON sequence deserialization using `newline` delimiter instead of `try except` #791

Closed hyperrealist closed 7 months ago

hyperrealist commented 7 months ago

This PR improves on #790 by using newline delimiter to check for the completeness of streamed JSON objects instead of catching JSONDecodeError.

Description

The splitline method called on decoded chunks of streamed JSON sequences is passed the argument keepends=True. That way it preserves the newline character at the end of each split, if any. Checking for its existence is used as a more robust way of detecting a complete JSON object.

Motivation and Context

If JSONDecodeError is raised for a legitimate reason the previous implementation would mask the exception and go on to do something it is not supposed to do by trying to mash together JSON objects incorrectly. A more robust implementation is to check for the presence of a newline delimiter to detect the end of a streamed serialized JSON object.

How Has This Been Tested?

This change was tested using pytest databroker/tests/test_broker.py::test_large_document