Closed nityanandagohain closed 2 weeks ago
Can we export to v2 parallelly? also, need to update the instrumentation to record the time for each table.
Two more things I wanted to point out.
I noticed the table engine on customers instances is not replicated for just the v2 table (AFAIK at least two customers have it deployed).
I was playing around and looked at the queries. All of them use the GLOBAL IN
for resource fingerprints and other Limit queries. I want to throw around the idea that if you can devise a fingerprinting and sharding mechanism that ensures the data is distributed evenly and the same fingerprint goes to the same shard, you can get rid of the GLOBAL IN
which is not optimal compared to local IN. In its current shape I don't think data distribution will be even since it's entirely based on resource attributes and one set of resources can send disproportionately high data compared to others so we probably don't have an alternative but I want to bring it up anyway if you have ideas.
https://clickhouse.com/docs/en/sql-reference/operators/in
When using GLOBAL IN / GLOBAL JOIN, first all the subqueries are run for GLOBAL IN / GLOBAL JOIN, and the results are collected in temporary tables. Then the temporary tables are sent to each remote server, where the queries are run using this temporary data.
This will work correctly and optimally if you are prepared for this case and have spread data across the cluster servers such that the data for a single UserID resides entirely on a single server. In this case, all the necessary data will be available locally on each server. Otherwise, the result will be inaccurate. We refer to this variation of the query as “local IN”.
Major changes made
logs_v2_resource_bucket
changed to logs_v2_resource
span_attributes
is changed from array to string
New logs schema
fixes https://github.com/SigNoz/signoz/issues/5555
---- testing /running 1) Make sure macro is enabled for running migrations
2) Run migrations using
go run cmd/signozschemamigrator/migrate.go --dsn http://localhost:9000 --replication true