ClickHouse / ClickBench

ClickBench: a Benchmark For Analytical Databases
https://benchmark.clickhouse.com/
Other
647 stars 148 forks source link

Add BoilingData to ClickBench #125

Open alexey-milovidov opened 1 year ago

alexey-milovidov commented 1 year ago

https://www.boilingdata.com/

Suggested by @danthegoodman1 in a discussion at https://clickhouse.com/slack

alexey-milovidov commented 1 year ago

This is a cloud service on top of duckdb - an embedded database engine for Python, which has some origins and ideas borrowed from ClickHouse.

The registration page works, but login fails in some cases:

Screenshot_20230724_040346

alexey-milovidov commented 1 year ago

The UI looks cute:

Screenshot_20230724_040749

One issue with the UI - when I type my query and press Enter, the entered query disappears, and I cannot easily run it again.

SELECT 1 does not work:

Screenshot_20230724_040922

alexey-milovidov commented 1 year ago

Every example from the list shows "Access denied":

Screenshot_20230724_041022

alexey-milovidov commented 1 year ago

I tried to use it with our public bucket, s3://clickhouse-public-datasets/, but it shows "access denied". It looks like I need to create an IAM role and switch to it: https://www.boilingdata.com/apidoc.html

alexey-milovidov commented 1 year ago

My test query is:

SELECT * FROM parquet_scan('s3://clickhouse-public-datasets/hits_compatible/hits.parquet') LIMIT 10

@santrancisco created an IAM role for me: arn:aws:iam::609927696493:role/boilingdata_test

and I've set it with

PRAGMA s3AccessRoleArn='arn:aws:iam::123456789012:role/bdS3';

as in the documentation.

But I'm still getting an access denied error.

danthegoodman1 commented 1 year ago

Sometimes it took over 2 minutes for a “select 1” to return for me

alexey-milovidov commented 1 year ago

After waiting for a while, the "Access denied" error disappeared, but now I'm getting a cryptic error message:

{
  "status": "Forward count > 1, loop?! (2)"
}

Screenshot_20230724_041404

which I don't know what it means.

alexey-milovidov commented 1 year ago

Sometimes it took over 2 minutes for a “select 1” to return for me

This contradicts the statement on their website:

Instant Computing Power For Realtime Dashboarding

Realtime dashboarding is a natural and well-explored use case for ClickHouse, and I don't like when other inferior engines, like duckdb, make unfounded claims, pretending to have similar capabilities.

alexey-milovidov commented 1 year ago

Trying other queries:

SELECT count(*) FROM parquet_scan('s3://clickhouse-public-datasets/hits_compatible/hits.parquet');
SELECT count(DISTINCT URL) FROM parquet_scan('s3://clickhouse-public-datasets/hits_compatible/hits.parquet');
SELECT count(DISTINCT URL) FROM parquet_scan('s3://clickhouse-public-datasets/hits_compatible/hits.parquet') GROUP BY UserID;

but none works giving the same error Forward count > 1, loop?! (2)

alexey-milovidov commented 1 year ago

@danthegoodman1

Sometimes it took over 2 minutes for a “select 1” to return for me

I'm waiting for several minutes, but then it logs me out. When I log in again, there is no query history.

I didn't have a single time the SELECT 1 query succeeded before it logged out my session.

dforsber commented 1 year ago

Hello, sorry for replying this late! Since Jul we have new GUI and new backend release so things should work better. You could use the bdcli to setup the IAM Role as it crafts the IAM Policy for you.

Does this help?

dforsber commented 11 months ago

This is the set of queries I'm running on Boiling. I've fixed some sort orders and column aliases to make the results comparable with local laptop running DuckDB instance. So, I have copied the hits.parquet file into Boiling demo S3 bucket on eu-west-1.

SELECT COUNT(*) AS count FROM parquet_scan('s3://boilingdata-demo/hits.parquet');
SELECT COUNT(*) AS count FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE AdvEngineID <> 0;
SELECT UserID FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE UserID = 435090932899640449;
SELECT UserID FROM parquet_scan('s3://boilingdata-demo/hits.parquet') ORDER BY UserID LIMIT 10;
SELECT COUNT(*) AS count FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE URL LIKE '%google%';
SELECT SearchPhrase, EventTime FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE SearchPhrase <> '' ORDER BY EventTime, SearchPhrase LIMIT 10;
SELECT SearchPhrase FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE SearchPhrase <> '' ORDER BY SearchPhrase LIMIT 10;
SELECT SearchPhrase FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE SearchPhrase <> '' ORDER BY EventTime, SearchPhrase LIMIT 10;
SELECT AdvEngineID, COUNT(*) AS count FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE AdvEngineID > 0 GROUP BY AdvEngineID ORDER BY COUNT(*) DESC;
SELECT SearchPhrase, MIN(URL) AS url, COUNT(*) AS c FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE URL LIKE '%google%' AND SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY c DESC, SearchPhrase LIMIT 10;
SELECT COUNT(*) AS total, SearchPhrase FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE SearchPhrase != '' GROUP BY SearchPhrase ORDER BY SearchPhrase, total DESC LIMIT 10;
SELECT COUNT(DISTINCT SearchPhrase) AS total FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE SearchPhrase != '';
SELECT COUNT(DISTINCT UserID) AS uniqueUserIds FROM parquet_scan('s3://boilingdata-demo/hits.parquet');
SELECT RegionID, COUNT(DISTINCT UserID) AS u FROM parquet_scan('s3://boilingdata-demo/hits.parquet') GROUP BY RegionID ORDER BY u DESC LIMIT 10;
SELECT UserID, COUNT(*) AS count FROM parquet_scan('s3://boilingdata-demo/hits.parquet') GROUP BY UserID ORDER BY COUNT(*) DESC, UserID LIMIT 10;
SELECT UserID, SearchPhrase, COUNT(*) FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE SearchPhrase <> '' GROUP BY UserID, SearchPhrase ORDER BY COUNT(*) DESC, UserID LIMIT 10;
SELECT MobilePhoneModel, COUNT(DISTINCT UserID) AS u FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE MobilePhoneModel <> '' AND UserID <> '' GROUP BY MobilePhoneModel ORDER BY u DESC LIMIT 10;
SELECT MobilePhone, MobilePhoneModel, COUNT(DISTINCT UserID) AS u FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE MobilePhoneModel <> ''AND UserID <> '' GROUP BY MobilePhone, MobilePhoneModel ORDER BY u DESC LIMIT 10;
SELECT SearchPhrase, COUNT(DISTINCT UserID) AS u FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY u DESC, SearchPhrase LIMIT 13;
SELECT SearchEngineID, SearchPhrase, COUNT(*) AS c FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE SearchPhrase <> '' GROUP BY SearchEngineID, SearchPhrase ORDER BY c DESC LIMIT 10;
SELECT UserID, COUNT(*) FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE UserID <> NULL GROUP BY UserID ORDER BY COUNT(*) DESC LIMIT 10;
SELECT SUM(ResolutionWidth) AS resWth, SUM(ResolutionWidth + 1) AS resW1, SUM(ResolutionWidth + 2) AS resW2, SUM(ResolutionWidth + 3) AS resW3, SUM(ResolutionWidth + 4) AS resW4, SUM(ResolutionWidth + 5) AS resW5, SUM(ResolutionWidth + 6) AS resW6, SUM(ResolutionWidth + 7) AS resW7, SUM(ResolutionWidth + 8) AS resW8, SUM(ResolutionWidth + 9) AS resW9, SUM(ResolutionWidth + 10) AS resW10, SUM(ResolutionWidth + 11) AS resW11, SUM(ResolutionWidth + 12) AS resW12, SUM(ResolutionWidth + 13) AS resW13, SUM(ResolutionWidth + 14) AS resW14, SUM(ResolutionWidth + 15) AS resW15, SUM(ResolutionWidth + 16) AS resW16, SUM(ResolutionWidth + 17) AS resW17, SUM(ResolutionWidth + 18) AS resW18, SUM(ResolutionWidth + 19) AS resW19, SUM(ResolutionWidth + 20) AS resW20, SUM(ResolutionWidth + 21) AS resW21, SUM(ResolutionWidth + 22) AS resW22, SUM(ResolutionWidth + 23) AS resW23, SUM(ResolutionWidth + 24) AS resW24, SUM(ResolutionWidth + 25) AS resW25, SUM(ResolutionWidth + 26) AS resW26, SUM(ResolutionWidth + 27) AS resW27, SUM(ResolutionWidth + 28) AS resW28, SUM(ResolutionWidth + 29) AS resW29, SUM(ResolutionWidth + 30) AS resW30, SUM(ResolutionWidth + 31) AS resW31, SUM(ResolutionWidth + 32) AS resW32, SUM(ResolutionWidth + 33) AS resW33, SUM(ResolutionWidth + 34) AS resW34, SUM(ResolutionWidth + 35) AS resW35, SUM(ResolutionWidth + 36) AS resW36, SUM(ResolutionWidth + 37) AS resW37, SUM(ResolutionWidth + 38) AS resW38, SUM(ResolutionWidth + 39) AS resW39, SUM(ResolutionWidth + 40) AS resW40, SUM(ResolutionWidth + 41) AS resW41, SUM(ResolutionWidth + 42) AS resW42, SUM(ResolutionWidth + 43) AS resW43, SUM(ResolutionWidth + 44) AS resW44, SUM(ResolutionWidth + 45) AS resW45, SUM(ResolutionWidth + 46) AS resW46, SUM(ResolutionWidth + 47) AS resW47, SUM(ResolutionWidth + 48) AS resW48, SUM(ResolutionWidth + 49) AS resW49, SUM(ResolutionWidth + 50) AS resW50, SUM(ResolutionWidth + 51) AS resW51, SUM(ResolutionWidth + 52) AS resW52, SUM(ResolutionWidth + 53) AS resW53, SUM(ResolutionWidth + 54) AS resW54, SUM(ResolutionWidth + 55) AS resW55, SUM(ResolutionWidth + 56) AS resW56, SUM(ResolutionWidth + 57) AS resW57, SUM(ResolutionWidth + 58) AS resW58, SUM(ResolutionWidth + 59) AS resW59, SUM(ResolutionWidth + 60) AS resW60, SUM(ResolutionWidth + 61) AS resW61, SUM(ResolutionWidth + 62) AS resW62, SUM(ResolutionWidth + 63) AS resW63, SUM(ResolutionWidth + 64) AS resW64, SUM(ResolutionWidth + 65) AS resW65, SUM(ResolutionWidth + 66) AS resW66, SUM(ResolutionWidth + 67) AS resW67, SUM(ResolutionWidth + 68) AS resW68, SUM(ResolutionWidth + 69) AS resW69, SUM(ResolutionWidth + 70) AS resW70, SUM(ResolutionWidth + 71) AS resW71, SUM(ResolutionWidth + 72) AS resW72, SUM(ResolutionWidth + 73) AS resW73, SUM(ResolutionWidth + 74) AS resW74, SUM(ResolutionWidth + 75) AS resW75, SUM(ResolutionWidth + 76) AS resW76, SUM(ResolutionWidth + 77) AS resW77, SUM(ResolutionWidth + 78) AS resW78, SUM(ResolutionWidth + 79) AS resW79, SUM(ResolutionWidth + 80) AS resW80, SUM(ResolutionWidth + 81) AS resW81, SUM(ResolutionWidth + 82) AS resW82, SUM(ResolutionWidth + 83) AS resW83, SUM(ResolutionWidth + 84) AS resW84, SUM(ResolutionWidth + 85) AS resW85, SUM(ResolutionWidth + 86) AS resW86, SUM(ResolutionWidth + 87) AS resW87, SUM(ResolutionWidth + 88) AS resW88, SUM(ResolutionWidth + 89) AS resW89 FROM parquet_scan('s3://boilingdata-demo/hits.parquet');
SELECT SearchPhrase, MIN(URL) AS url, MIN(Title) AS title, COUNT(*) AS c, COUNT(DISTINCT UserID) AS usrIdCount FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE Title LIKE '%Google%' AND URL NOT LIKE '%.google.%' AND SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY c, searchphrase DESC LIMIT 10;
SELECT WatchID, ClientIP, COUNT(*) AS c, SUM(IsRefresh), AVG(ResolutionWidth) FROM parquet_scan('s3://boilingdata-demo/hits.parquet') GROUP BY WatchID, ClientIP ORDER BY c DESC, WatchID, ClientIP LIMIT 10;
SELECT URL, COUNT(*) AS c FROM parquet_scan('s3://boilingdata-demo/hits.parquet') GROUP BY URL ORDER BY c DESC LIMIT 12;
SELECT URL, COUNT(*) AS c FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE url ='http://kinopoisk.ru/perm.irr.ru' GROUP BY URL;
SELECT SUM(AdvEngineID), COUNT(*), AVG(ResolutionWidth) FROM parquet_scan('s3://boilingdata-demo/hits.parquet');
SELECT AVG(UserID) FROM parquet_scan('s3://boilingdata-demo/hits.parquet');
SELECT MIN(EventDate) AS minEventDate, MAX(EventDate) AS maxEventDate FROM parquet_scan('s3://boilingdata-demo/hits.parquet');
SELECT RegionID, SUM(AdvEngineID) AS sumAdvEngineId, COUNT(*) AS c, AVG(ResolutionWidth) AS avgResolutionWidth, COUNT(DISTINCT UserID) AS distinctUsers FROM parquet_scan('s3://boilingdata-demo/hits.parquet') GROUP BY RegionID ORDER BY c DESC LIMIT 10;
SELECT 1, URL, COUNT(*) AS c FROM parquet_scan('s3://boilingdata-demo/hits.parquet') GROUP BY 1, URL ORDER BY c DESC LIMIT 10;
SELECT URL, COUNT(*) AS PageViews FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND DontCountHits = 0 AND IsRefresh = 0 AND URL <> '' GROUP BY URL ORDER BY PageViews DESC LIMIT 10;
SELECT Title, COUNT(*) AS PageViews FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND DontCountHits = 0 AND IsRefresh = 0 AND Title <> '' GROUP BY Title ORDER BY PageViews DESC LIMIT 10;
SELECT URL, COUNT(*) AS PageViews FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND IsRefresh = 0 AND IsLink <> 0 AND IsDownload = 0 GROUP BY URL ORDER BY PageViews DESC LIMIT 10 OFFSET 1000;
SELECT TraficSourceID, SearchEngineID, AdvEngineID, CASE WHEN (SearchEngineID = 0 AND AdvEngineID = 0) THEN Referer ELSE '' END AS Src, URL AS Dst, COUNT(*) AS PageViews FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND IsRefresh = 0 GROUP BY TraficSourceID, SearchEngineID, AdvEngineID, Src, Dst ORDER BY PageViews DESC LIMIT 10 OFFSET 1000;
SELECT URLHash, EventDate, COUNT(*) AS PageViews FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND IsRefresh = 0 AND TraficSourceID IN (-1, 6) AND RefererHash = 3594120000172545465 GROUP BY URLHash, EventDate ORDER BY PageViews DESC LIMIT 10 OFFSET 100;
SELECT WindowClientWidth, WindowClientHeight, COUNT(*) AS PageViews FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND IsRefresh = 0 AND DontCountHits = 0 AND URLHash = 2868770270353813622 GROUP BY WindowClientWidth, WindowClientHeight ORDER BY PageViews DESC LIMIT 10 OFFSET 10000;
SELECT REGEXP_REPLACE(Referer, '^https?://(?:www.)?([^/]+)/.*$', '\x01') AS k, AVG(STRLEN(Referer)) AS l, COUNT(*) AS c, MIN(Referer) FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE Referer <> '' GROUP BY k HAVING COUNT(*) > 100000 ORDER BY l DESC, c LIMIT 25;
SELECT SearchEngineID, ClientIP, COUNT(*) AS c, SUM(IsRefresh) AS sumIsRefresh, AVG(ResolutionWidth) AS avgResWidth FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE SearchPhrase <> '' GROUP BY SearchEngineID, ClientIP ORDER BY c DESC, SearchEngineID, ClientIP LIMIT 10;
SELECT WatchID, ClientIP, COUNT(*) AS c, SUM(IsRefresh) AS sumIsRefresh, AVG(ResolutionWidth) AS AvgRsesWidth FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE SearchPhrase <> '' GROUP BY WatchID, ClientIP ORDER BY c, WatchID, ClientIP DESC LIMIT 10;
SELECT ClientIP, ClientIP - 1, ClientIP - 2, ClientIP - 3, COUNT(*) AS c FROM parquet_scan('s3://boilingdata-demo/hits.parquet') GROUP BY ClientIP, ClientIP - 1, ClientIP - 2, ClientIP - 3 ORDER BY c DESC, ClientIP LIMIT 11;
SELECT COUNT(*) AS count FROM ( SELECT SearchPhrase FROM parquet_scan('s3://boilingdata-demo/hits.parquet') GROUP BY SearchPhrase ) AS a ORDER BY count DESC;
SELECT DATE_TRUNC('minute', EventTime::timestamp) AS M, COUNT(*) AS PageViews FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE CounterID = 62 AND EventDate >= '2013-07-14' AND EventDate <= '2013-07-15' AND IsRefresh = 0 AND DontCountHits = 0 GROUP BY DATE_TRUNC('minute', EventTime::timestamp) ORDER BY DATE_TRUNC('minute', EventTime::timestamp) LIMIT 10 OFFSET 1000;
SELECT CounterID, AVG(STRLEN(URL)) AS l, COUNT(*) AS c FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE URL <> '' GROUP BY CounterID HAVING COUNT(*) > 100000 ORDER BY l DESC LIMIT 25;
SELECT * FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE URL LIKE '%google%' ORDER BY EventTime, WatchID LIMIT 1;
alexey-milovidov commented 11 months ago

Thank you! Yes, let's try these queries. I can also try if I will not forget, but you can also submit the results.

dforsber commented 10 months ago

I've created "engineering white paper" with the test results. Link is on front page: https://www.boilingdata.com/