Consider a way to iterate over the contents of an `INDEX_MAP` without requiring the entire JSON document be loaded into memory

distributed-system-analysis / pbench

A benchmarking and performance analysis framework

GNU General Public License v3.0

186 stars 108 forks source link

From the Jira planning note I wrote earlier:

STORY[S]: re-think document map metadata to allow consuming the map piecemeal (e.g., by index) rather than pulling in and managing the entire map in one JSON document. For example, one key might be a list of index names in the dataset, with separate keys for the list of documents in each index. The hierarchical structure of metadata would easily accommodate this: e.g., instead of “map”: {“index1”: [“id1, “id2”], “index2”: [“id3”, “id4”] …}, something like “indices”: [“index1”, “index2”], “index1”: [“id1”, “id2”], and “index3”: [“id3”, “id4”] as separate metadata keys… we have to be a little careful about how we nest data because of the way the JSON metadata document is stored.

Note that if we can get rid of the unit/legacy test sqlite3 DB, PostgreSQL supports native JSON column queries that would allow us to query nested fields of a SQL JSON column directly to better manage server memory, rather than reading the entire column value and pulling it apart or changing the way it's stored. (This is one of those "test strategy" things where we really need to get rid of the "mocked functional test" environment in favor of real isolated unit testing and full toolchain functional testing.)

distributed-system-analysis / pbench

Consider a way to iterate over the contents of an `INDEX_MAP` without requiring the entire JSON document be loaded into memory #2505