Yomguithereal / mnemonist

Curated collection of data structures for the JavaScript/TypeScript language.
https://yomguithereal.github.io/mnemonist
MIT License
2.27k stars 92 forks source link

MultiBiMap #110

Open pbadenski opened 5 years ago

pbadenski commented 5 years ago

More of explorative question.. Is it possible to easily compose existing MultiMap and BiMap data structures to get a MultiBiMap?

Yomguithereal commented 5 years ago

It depends on what you actually need. My early intuition is that it would be a bit confusing to know how to maintain the bijection that the BiMap enforces. What's more, does it mean that left keys map to n right keys? Or does it mean that n left keys map to n right keys?

Yomguithereal commented 5 years ago

Also, do you have a use case in mind?

pbadenski commented 5 years ago

Yes, use case is a bit involved... In our system (finance domain) data is presented in a spreadsheet like fashion where each cell is calculated using multiple data points. Data points change in real time, for each change we need to know which cells are modified. Once we're recalculating the cell, we need to figure out all data points that it relies on in order to feed into downstream algorithm.

Change in a single data point can cause recalculation of multiple cells. Each cell can depend on many data points.

Hence MultiBiMap :)

Yomguithereal commented 5 years ago

Wouldn't a full-fledged graph (or an acyclic one such as a dependency tree) data structure be more adapted to this use case?

pbadenski commented 5 years ago

Not really.. but I see how my mentioning of spreadsheet might be a bit confusing here. Those data points come from an external API, rather than existing in a spreadsheet. Cells do not depend on each other, feeds do not depend on other feeds. So it looks more like this:

Cell_1 => (depends on) => [FeedA, FeedB] Cell_2 => (depends on) => [FeedB, FeedC]

Queries we often ask:

  1. What does Cell_2 depend on? => [FeedB, FeedC]
  2. FeedB changed, which cells should I update? => [Cell1, Cell2]
Yomguithereal commented 5 years ago

It seems to me that a graph can still elegantly solve your problem: you have a bipartite graph where it is guaranteed that you cannot have intra-component edges.

Here is an example on how to modelize your case using the graphology library:

var UndirectedGraph = require("graphology").UndirectedGraph;

var g = new UndirectedGraph();

g.addNode("Cell1");
g.addNode("Cell2");
g.addNode("FeedA");
g.addNode("FeedB");
g.addNode("FeedC");

g.addEdge("Cell1", "FeedA");
g.addEdge("Cell1", "FeedB");
g.addEdge("Cell2", "FeedB");
g.addEdge("Cell2", "FeedC");

// query 1:
g.neighbors("Cell2");

// query 2:
g.neighbors("FeedB");

But I see how we could formulate MultiBiMap now. However I think it would only be useful if you need a most performant implementation of it, memory & computation-wise because the graph presented here is enough for a lot of use cases.

pbadenski commented 5 years ago

This looks interesting, I'll have a look. We need relatively efficient structure computation-wise. I will have a look what performance characteristics graph solution would have. Thank you!

Yomguithereal commented 5 years ago

Computation-wise, using graphology, both your write & queries will be O(1). But we can cut everything that makes graphology generic to better suit your use case if required. Do you insert keys in increasing order?

pbadenski commented 5 years ago

Sorry, been on holidays. I think graphology might be a good candidate, I should give it a shot. What are you referring to in terms of order? There's not really any order requirements.

Yomguithereal commented 5 years ago

Like, if your keys are integers from 0 to n, for instance, and that you insert them into the graph in increasing order, then we can rely on sorted lists sometimes and decrease memory consumption while still being able to perform logarithmic lookups.