Closed dvgorbunkov closed 4 months ago
Hi, thanks for the report.
Having the sample cell towers dataset, I tried to do something similar using Metabase 0.45.2 and plugin version 1.0.3:
SELECT count(distinct(area))
FROM `default`.`cell_towers`
WHERE radio = 'CDMA'
AND updated >= '2011-07-01'
AND updated <= '2011-07-31';
It produces the same result in the Metabase interface and the standard clickhouse-client.
Can you please try upgrading the driver (to at least 0.9.0 as the underlying JDBC driver has been changed there) and then check if the issue persists?
Cheers.
@slvrtrn In the previous changelog in the driver, I didn't find a mention of my case, that's why I wrote here about the problem. Thanks for the advice. I'll definitely update and come back here with the result.
Unfortunately the update didn't help. I installed the latest metabase and the latest clickhouse driver (1.0.3). The problem is reproduced.
{
"browser-info": {
"language": "ru-RU",
"platform": "Linux x86_64",
"userAgent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36",
"vendor": "Google Inc."
},
"system-info": {
"file.encoding": "UTF-8",
"java.runtime.name": "OpenJDK Runtime Environment",
"java.runtime.version": "11.0.18+10",
"java.vendor": "Eclipse Adoptium",
"java.vendor.url": "https://adoptium.net/",
"java.version": "11.0.18",
"java.vm.name": "OpenJDK 64-Bit Server VM",
"java.vm.version": "11.0.18+10",
"os.name": "Linux",
"os.version": "5.10.0-13-amd64",
"user.language": "en",
"user.timezone": "GMT"
},
"metabase-info": {
"databases": [
"h2",
"clickhouse"
],
"hosting-env": "unknown",
"application-database": "h2",
"application-database-details": {
"database": {
"name": "H2",
"version": "1.4.197 (2018-03-18)"
},
"jdbc-driver": {
"name": "H2 JDBC Driver",
"version": "1.4.197 (2018-03-18)"
}
},
"run-mode": "prod",
"version": {
"date": "2023-01-27",
"tag": "v0.45.2.1",
"branch": "release-x.45.2.x",
"hash": "1a59de7"
},
"settings": {
"report-timezone": null
}
}
}
P.S. This is very strange, but if you make a request like
select
count(*)
from
(
select
distinct(orderId)
from
AttributedPartnerOrders
where
partnerId = '...'
and createdDate >= '2023-01-01'
and createdDate <= '2023-01-31'
and clickedMinutesAgo < 1440
)
then the result is correct and does not change if I repeat it again.
Thanks for trying it out.
That is odd indeed!
Could you provide the table structure, maybe with a limited number of fields (the ones used in the query at least), but with exact ClickHouse types?
I will generate some dummy data and try to reproduce it on my end.
Thank you for taking such good care of my question.
CREATE VIEW AttributedPartnerOrders
(
`partnerId` FixedString(24),
`visitorId` Nullable(FixedString(24)),
`createdAt` DateTime,
`createdDate` Date,
`itemId` Nullable(Int64),
`orderId` Nullable(String),
`qnt` Nullable(Float64),
`amount` Nullable(Float64),
`clickedAt` DateTime,
`clickedMinutesAgo` Float64,
`blockId` FixedString(32),
`method` Nullable(String),
`email` Nullable(String),
`updatedAt` DateTime
) AS
SELECT *
FROM AttributedPartnerOrdersMaterialized
Thanks. Will have a look.
I still cannot reproduce this.
Here's what I did:
CREATE TABLE AttributedPartnerOrdersMaterialized
(
`partnerId` FixedString(24),
`createdDate` Date,
`orderId` Nullable(String),
`clickedMinutesAgo` Float64
) ENGINE MergeTree ORDER BY partnerId;
CREATE VIEW AttributedPartnerOrders
(
`partnerId` FixedString(24),
`createdDate` Date,
`orderId` Nullable(String),
`clickedMinutesAgo` Float64
)
AS
SELECT *
FROM AttributedPartnerOrdersMaterialized;
so I'd have some minimal schema setup for this test.
I generated 1 million random rows, such as
10 unique partner ids, 100k unique orderIds there
running several queries such as
select count(distinct(orderId)) from AttributedPartnerOrders
where partnerId = 'c359c87522234e459890fd88'
and createdDate >= '2023-01-01'
and createdDate <= '2023-01-31'
and clickedMinutesAgo < 1440
yielded the same result in both Metabase and ClickHouse.
I think we are missing something else here...
To help with the investigation, could you provide more details:
to speed up the process, maybe you could ping me in the ClickHouse Community Slack
I have 4 servers with clickhouse version 22.8.4.7. Two sharded servers, each with one additional server with a replica. I have the following table schema
CREATE TABLE AttributedPartnerOrdersMaterialized_local ON CLUSTER my_cluster (
`partnerId` FixedString(24),
`createdDate` Date,
`itemId` Nullable(Int64),
`clickedMinutesAgo` Float64,
...
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/AttributedPartnerOrdersMaterialized_local', '{replica}')
PARTITION BY toYYYYMMDD(createdDate) ORDER BY (partnerId, blockId) SETTINGS index_granularity = 8192;
CREATE TABLE AttributedPartnerOrdersMaterialized ON CLUSTER my_cluster (
`partnerId` FixedString(24),
`createdDate` Date,
`itemId` Nullable(Int64),
`clickedMinutesAgo` Float64,
...
) ENGINE = Distributed('my_cluster', 'mydb', 'AttributedPartnerOrdersMaterialized_local', rand());
CREATE VIEW AttributedPartnerOrders ON CLUSTER my_cluster (
`partnerId` FixedString(24),
`createdDate` Date,
`itemId` Nullable(Int64),
`clickedMinutesAgo` Float64,
...
) AS SELECT * FROM AttributedPartnerOrdersMaterialized;
As a result, if we make a query on the Distributed table (AttributedPartnerOrdersMaterialized), we get the correct data that does not change if we repeat the query. However, if we make query view table (AttributedPartnerOrders), then we get a random number. This is still strange, but we will be satisfied with the option of transferring queries to another table.
Thanks for pointing out the data model. We have an obvious error.
However, if we make query view table (AttributedPartnerOrders), then we get a random number.
is it the case for clickhouse-client as well or only Metabase?
is it the case for clickhouse-client as well or only Metabase?
only Metabase
Glad to hear that it works for you now. I will leave this issue open as we still need to investigate why it behaves awkwardly when using a View over a Distributed table.
I'm closing this for now, as it is still unclear what the root cause was and if it's still an issue in the recent driver releases. If there are more details and/or minimal repro available now, feel free to re-open or create a new one.
HI! I'm using metabase version 0.44.1 with version clickhouse driver 0.8.3. Found a problem that a simple query
produces a result 6046, but if you repeat it, it gives 5971.
The numbers are constantly changing.
Through the standard clickhouse-client, the data is given correctly and the same. In the metabase logs we see only regular lines for this request