SigNoz / signoz-otel-collector

SigNoz distro for OpenTelemetry Collector
44 stars 39 forks source link

feat: change body index to ngram #309

Closed nityanandagohain closed 2 months ago

nityanandagohain commented 5 months ago

migration to add ngram index to body.

fixes https://github.com/SigNoz/signoz/issues/4259

srikanthccv commented 5 months ago

A short length ngram also means there is a scope of false positives in many contexts. Imagine the IP address being part of the body and the IP range is 10.8.x.y - for such a dataset all granules may match because the prefix matches. This can manifest in several forms and becomes a problem as the more diverse users we onboard. Our benchmarks are only as good as our test data.

This fix will help some users and we should roll it out but I'd also suggest continuing to look into more options if possible.

nityanandagohain commented 5 months ago

I think it depends more on the cardinality rather than perfixes, if your data is higly cardinal the index will be able to benifit the query.

In the above on if the ip adress is highly cardinal it will be able to skip granules easily. because even the last ngram will result in a unique hash.

nityanandagohain commented 2 months ago

@srikanthccv I have updated only the value of index to 60KB from 50KB, please have a look once.