When feeding in a large dataset, unless we cap the amount of data consumed e.g. via blocking queue to the import (also affects other nodes) the memory consumption of nodes is technically unbounded.
To fix this issue we'll need a common mechanism for reading data in such a way that after we reach a certain threshold (default + config via env var) we throttle the consumption of new input
The problem:
When feeding in a large dataset, unless we cap the amount of data consumed e.g. via blocking queue to the import (also affects other nodes) the memory consumption of nodes is technically unbounded.
To fix this issue we'll need a common mechanism for reading data in such a way that after we reach a certain threshold (default + config via env var) we throttle the consumption of new input