gchq / stroom

Stroom is a highly scalable data storage, processing and analysis platform.
https://gchq.github.io/stroom-docs/
Apache License 2.0
431 stars 55 forks source link

Range values in reference data #3977

Open at055612 opened 8 months ago

at055612 commented 8 months ago

In ReferenceDataFilter in the following code

                if (key != null) {
                    LOGGER.trace("Putting key {} into map {}", key, mapDefinition);
                    refDataLoaderHolder.getRefDataLoader().put(mapDefinition, key, stagingValueOutputStream);
                } else if (rangeFrom != null && rangeTo != null) {
                    if (rangeFrom > rangeTo) {
                        errorReceiverProxy.log(Severity.ERROR, null, getElementId(),
                                "Range from '" + rangeFrom
                                        + "' must be less than or equal to range to '" + rangeTo + "'",
                                null);
                    } else if (rangeFrom < 0) {
                        // negative values cause problems for the ordering of data in LMDB so prevent their use
                        // when using byteBuffer.putLong, -10, 0 & 10 will be stored in LMDB as 0, 10, -10
                        errorReceiverProxy.log(Severity.ERROR, null, getElementId(),
                                LogUtil.message(
                                        "Only non-negative numbers are supported (from: {}, to: {})",
                                        rangeFrom, rangeTo), null);

                    } else {
                        // convert from inclusive rangeTo to exclusive rangeTo
                        // if from==to we still record it as a range
                        final Range<Long> range = new Range<>(rangeFrom, rangeTo + 1);
                        LOGGER.trace("Putting range {} into map {}", range, mapDefinition);
                        refDataLoaderHolder.getRefDataLoader().put(mapDefinition, range, stagingValueOutputStream);
                    }
                }

we should be logging an error if there is no key or a complete range.

ALSO, I'm not sure if from==to we still record it as a range is right. If from==to then we may as well store it in the keyDb as we always do a lookup in the keydb first. This needs checking along with the implications of changing behaviour on existing data.

The code that does the lookup on the key/range db is in stroom.pipeline.refdata.store.offheapstore.RefDataOffHeapStore#getValueStoreKey

at055612 commented 8 months ago

Relates to https://github.com/gchq/stroom/issues/2488