Open alexey-milovidov opened 1 year ago
This is a cloud service on top of duckdb - an embedded database engine for Python, which has some origins and ideas borrowed from ClickHouse.
The registration page works, but login fails in some cases:
The UI looks cute:
One issue with the UI - when I type my query and press Enter, the entered query disappears, and I cannot easily run it again.
SELECT 1
does not work:
Every example from the list shows "Access denied":
I tried to use it with our public bucket, s3://clickhouse-public-datasets/
, but it shows "access denied".
It looks like I need to create an IAM role and switch to it: https://www.boilingdata.com/apidoc.html
My test query is:
SELECT * FROM parquet_scan('s3://clickhouse-public-datasets/hits_compatible/hits.parquet') LIMIT 10
@santrancisco created an IAM role for me: arn:aws:iam::609927696493:role/boilingdata_test
and I've set it with
PRAGMA s3AccessRoleArn='arn:aws:iam::123456789012:role/bdS3';
as in the documentation.
But I'm still getting an access denied error.
Sometimes it took over 2 minutes for a “select 1” to return for me
After waiting for a while, the "Access denied" error disappeared, but now I'm getting a cryptic error message:
{
"status": "Forward count > 1, loop?! (2)"
}
which I don't know what it means.
Sometimes it took over 2 minutes for a “select 1” to return for me
This contradicts the statement on their website:
Instant Computing Power For Realtime Dashboarding
Realtime dashboarding is a natural and well-explored use case for ClickHouse, and I don't like when other inferior engines, like duckdb, make unfounded claims, pretending to have similar capabilities.
Trying other queries:
SELECT count(*) FROM parquet_scan('s3://clickhouse-public-datasets/hits_compatible/hits.parquet');
SELECT count(DISTINCT URL) FROM parquet_scan('s3://clickhouse-public-datasets/hits_compatible/hits.parquet');
SELECT count(DISTINCT URL) FROM parquet_scan('s3://clickhouse-public-datasets/hits_compatible/hits.parquet') GROUP BY UserID;
but none works giving the same error Forward count > 1, loop?! (2)
@danthegoodman1
Sometimes it took over 2 minutes for a “select 1” to return for me
I'm waiting for several minutes, but then it logs me out. When I log in again, there is no query history.
I didn't have a single time the SELECT 1
query succeeded before it logged out my session.
Hello, sorry for replying this late! Since Jul we have new GUI and new backend release so things should work better. You could use the bdcli to setup the IAM Role as it crafts the IAM Policy for you.
clickhouse-public-datasets
into the IAM policy. Is it possible that the bucket policy does not allow get-bucket-location API call?).SELECT 1
now also works 🎉 😅 and it should respond fast and is probably a good measure of the overhead latency between client and Boiling cloudeu-west-1
. So, you would need to have data there in this region, i.e. the S3 Bucket.Does this help?
This is the set of queries I'm running on Boiling. I've fixed some sort orders and column aliases to make the results comparable with local laptop running DuckDB instance. So, I have copied the hits.parquet
file into Boiling demo S3 bucket on eu-west-1
.
SELECT COUNT(*) AS count FROM parquet_scan('s3://boilingdata-demo/hits.parquet');
SELECT COUNT(*) AS count FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE AdvEngineID <> 0;
SELECT UserID FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE UserID = 435090932899640449;
SELECT UserID FROM parquet_scan('s3://boilingdata-demo/hits.parquet') ORDER BY UserID LIMIT 10;
SELECT COUNT(*) AS count FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE URL LIKE '%google%';
SELECT SearchPhrase, EventTime FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE SearchPhrase <> '' ORDER BY EventTime, SearchPhrase LIMIT 10;
SELECT SearchPhrase FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE SearchPhrase <> '' ORDER BY SearchPhrase LIMIT 10;
SELECT SearchPhrase FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE SearchPhrase <> '' ORDER BY EventTime, SearchPhrase LIMIT 10;
SELECT AdvEngineID, COUNT(*) AS count FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE AdvEngineID > 0 GROUP BY AdvEngineID ORDER BY COUNT(*) DESC;
SELECT SearchPhrase, MIN(URL) AS url, COUNT(*) AS c FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE URL LIKE '%google%' AND SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY c DESC, SearchPhrase LIMIT 10;
SELECT COUNT(*) AS total, SearchPhrase FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE SearchPhrase != '' GROUP BY SearchPhrase ORDER BY SearchPhrase, total DESC LIMIT 10;
SELECT COUNT(DISTINCT SearchPhrase) AS total FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE SearchPhrase != '';
SELECT COUNT(DISTINCT UserID) AS uniqueUserIds FROM parquet_scan('s3://boilingdata-demo/hits.parquet');
SELECT RegionID, COUNT(DISTINCT UserID) AS u FROM parquet_scan('s3://boilingdata-demo/hits.parquet') GROUP BY RegionID ORDER BY u DESC LIMIT 10;
SELECT UserID, COUNT(*) AS count FROM parquet_scan('s3://boilingdata-demo/hits.parquet') GROUP BY UserID ORDER BY COUNT(*) DESC, UserID LIMIT 10;
SELECT UserID, SearchPhrase, COUNT(*) FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE SearchPhrase <> '' GROUP BY UserID, SearchPhrase ORDER BY COUNT(*) DESC, UserID LIMIT 10;
SELECT MobilePhoneModel, COUNT(DISTINCT UserID) AS u FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE MobilePhoneModel <> '' AND UserID <> '' GROUP BY MobilePhoneModel ORDER BY u DESC LIMIT 10;
SELECT MobilePhone, MobilePhoneModel, COUNT(DISTINCT UserID) AS u FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE MobilePhoneModel <> ''AND UserID <> '' GROUP BY MobilePhone, MobilePhoneModel ORDER BY u DESC LIMIT 10;
SELECT SearchPhrase, COUNT(DISTINCT UserID) AS u FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY u DESC, SearchPhrase LIMIT 13;
SELECT SearchEngineID, SearchPhrase, COUNT(*) AS c FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE SearchPhrase <> '' GROUP BY SearchEngineID, SearchPhrase ORDER BY c DESC LIMIT 10;
SELECT UserID, COUNT(*) FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE UserID <> NULL GROUP BY UserID ORDER BY COUNT(*) DESC LIMIT 10;
SELECT SUM(ResolutionWidth) AS resWth, SUM(ResolutionWidth + 1) AS resW1, SUM(ResolutionWidth + 2) AS resW2, SUM(ResolutionWidth + 3) AS resW3, SUM(ResolutionWidth + 4) AS resW4, SUM(ResolutionWidth + 5) AS resW5, SUM(ResolutionWidth + 6) AS resW6, SUM(ResolutionWidth + 7) AS resW7, SUM(ResolutionWidth + 8) AS resW8, SUM(ResolutionWidth + 9) AS resW9, SUM(ResolutionWidth + 10) AS resW10, SUM(ResolutionWidth + 11) AS resW11, SUM(ResolutionWidth + 12) AS resW12, SUM(ResolutionWidth + 13) AS resW13, SUM(ResolutionWidth + 14) AS resW14, SUM(ResolutionWidth + 15) AS resW15, SUM(ResolutionWidth + 16) AS resW16, SUM(ResolutionWidth + 17) AS resW17, SUM(ResolutionWidth + 18) AS resW18, SUM(ResolutionWidth + 19) AS resW19, SUM(ResolutionWidth + 20) AS resW20, SUM(ResolutionWidth + 21) AS resW21, SUM(ResolutionWidth + 22) AS resW22, SUM(ResolutionWidth + 23) AS resW23, SUM(ResolutionWidth + 24) AS resW24, SUM(ResolutionWidth + 25) AS resW25, SUM(ResolutionWidth + 26) AS resW26, SUM(ResolutionWidth + 27) AS resW27, SUM(ResolutionWidth + 28) AS resW28, SUM(ResolutionWidth + 29) AS resW29, SUM(ResolutionWidth + 30) AS resW30, SUM(ResolutionWidth + 31) AS resW31, SUM(ResolutionWidth + 32) AS resW32, SUM(ResolutionWidth + 33) AS resW33, SUM(ResolutionWidth + 34) AS resW34, SUM(ResolutionWidth + 35) AS resW35, SUM(ResolutionWidth + 36) AS resW36, SUM(ResolutionWidth + 37) AS resW37, SUM(ResolutionWidth + 38) AS resW38, SUM(ResolutionWidth + 39) AS resW39, SUM(ResolutionWidth + 40) AS resW40, SUM(ResolutionWidth + 41) AS resW41, SUM(ResolutionWidth + 42) AS resW42, SUM(ResolutionWidth + 43) AS resW43, SUM(ResolutionWidth + 44) AS resW44, SUM(ResolutionWidth + 45) AS resW45, SUM(ResolutionWidth + 46) AS resW46, SUM(ResolutionWidth + 47) AS resW47, SUM(ResolutionWidth + 48) AS resW48, SUM(ResolutionWidth + 49) AS resW49, SUM(ResolutionWidth + 50) AS resW50, SUM(ResolutionWidth + 51) AS resW51, SUM(ResolutionWidth + 52) AS resW52, SUM(ResolutionWidth + 53) AS resW53, SUM(ResolutionWidth + 54) AS resW54, SUM(ResolutionWidth + 55) AS resW55, SUM(ResolutionWidth + 56) AS resW56, SUM(ResolutionWidth + 57) AS resW57, SUM(ResolutionWidth + 58) AS resW58, SUM(ResolutionWidth + 59) AS resW59, SUM(ResolutionWidth + 60) AS resW60, SUM(ResolutionWidth + 61) AS resW61, SUM(ResolutionWidth + 62) AS resW62, SUM(ResolutionWidth + 63) AS resW63, SUM(ResolutionWidth + 64) AS resW64, SUM(ResolutionWidth + 65) AS resW65, SUM(ResolutionWidth + 66) AS resW66, SUM(ResolutionWidth + 67) AS resW67, SUM(ResolutionWidth + 68) AS resW68, SUM(ResolutionWidth + 69) AS resW69, SUM(ResolutionWidth + 70) AS resW70, SUM(ResolutionWidth + 71) AS resW71, SUM(ResolutionWidth + 72) AS resW72, SUM(ResolutionWidth + 73) AS resW73, SUM(ResolutionWidth + 74) AS resW74, SUM(ResolutionWidth + 75) AS resW75, SUM(ResolutionWidth + 76) AS resW76, SUM(ResolutionWidth + 77) AS resW77, SUM(ResolutionWidth + 78) AS resW78, SUM(ResolutionWidth + 79) AS resW79, SUM(ResolutionWidth + 80) AS resW80, SUM(ResolutionWidth + 81) AS resW81, SUM(ResolutionWidth + 82) AS resW82, SUM(ResolutionWidth + 83) AS resW83, SUM(ResolutionWidth + 84) AS resW84, SUM(ResolutionWidth + 85) AS resW85, SUM(ResolutionWidth + 86) AS resW86, SUM(ResolutionWidth + 87) AS resW87, SUM(ResolutionWidth + 88) AS resW88, SUM(ResolutionWidth + 89) AS resW89 FROM parquet_scan('s3://boilingdata-demo/hits.parquet');
SELECT SearchPhrase, MIN(URL) AS url, MIN(Title) AS title, COUNT(*) AS c, COUNT(DISTINCT UserID) AS usrIdCount FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE Title LIKE '%Google%' AND URL NOT LIKE '%.google.%' AND SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY c, searchphrase DESC LIMIT 10;
SELECT WatchID, ClientIP, COUNT(*) AS c, SUM(IsRefresh), AVG(ResolutionWidth) FROM parquet_scan('s3://boilingdata-demo/hits.parquet') GROUP BY WatchID, ClientIP ORDER BY c DESC, WatchID, ClientIP LIMIT 10;
SELECT URL, COUNT(*) AS c FROM parquet_scan('s3://boilingdata-demo/hits.parquet') GROUP BY URL ORDER BY c DESC LIMIT 12;
SELECT URL, COUNT(*) AS c FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE url ='http://kinopoisk.ru/perm.irr.ru' GROUP BY URL;
SELECT SUM(AdvEngineID), COUNT(*), AVG(ResolutionWidth) FROM parquet_scan('s3://boilingdata-demo/hits.parquet');
SELECT AVG(UserID) FROM parquet_scan('s3://boilingdata-demo/hits.parquet');
SELECT MIN(EventDate) AS minEventDate, MAX(EventDate) AS maxEventDate FROM parquet_scan('s3://boilingdata-demo/hits.parquet');
SELECT RegionID, SUM(AdvEngineID) AS sumAdvEngineId, COUNT(*) AS c, AVG(ResolutionWidth) AS avgResolutionWidth, COUNT(DISTINCT UserID) AS distinctUsers FROM parquet_scan('s3://boilingdata-demo/hits.parquet') GROUP BY RegionID ORDER BY c DESC LIMIT 10;
SELECT 1, URL, COUNT(*) AS c FROM parquet_scan('s3://boilingdata-demo/hits.parquet') GROUP BY 1, URL ORDER BY c DESC LIMIT 10;
SELECT URL, COUNT(*) AS PageViews FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND DontCountHits = 0 AND IsRefresh = 0 AND URL <> '' GROUP BY URL ORDER BY PageViews DESC LIMIT 10;
SELECT Title, COUNT(*) AS PageViews FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND DontCountHits = 0 AND IsRefresh = 0 AND Title <> '' GROUP BY Title ORDER BY PageViews DESC LIMIT 10;
SELECT URL, COUNT(*) AS PageViews FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND IsRefresh = 0 AND IsLink <> 0 AND IsDownload = 0 GROUP BY URL ORDER BY PageViews DESC LIMIT 10 OFFSET 1000;
SELECT TraficSourceID, SearchEngineID, AdvEngineID, CASE WHEN (SearchEngineID = 0 AND AdvEngineID = 0) THEN Referer ELSE '' END AS Src, URL AS Dst, COUNT(*) AS PageViews FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND IsRefresh = 0 GROUP BY TraficSourceID, SearchEngineID, AdvEngineID, Src, Dst ORDER BY PageViews DESC LIMIT 10 OFFSET 1000;
SELECT URLHash, EventDate, COUNT(*) AS PageViews FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND IsRefresh = 0 AND TraficSourceID IN (-1, 6) AND RefererHash = 3594120000172545465 GROUP BY URLHash, EventDate ORDER BY PageViews DESC LIMIT 10 OFFSET 100;
SELECT WindowClientWidth, WindowClientHeight, COUNT(*) AS PageViews FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND IsRefresh = 0 AND DontCountHits = 0 AND URLHash = 2868770270353813622 GROUP BY WindowClientWidth, WindowClientHeight ORDER BY PageViews DESC LIMIT 10 OFFSET 10000;
SELECT REGEXP_REPLACE(Referer, '^https?://(?:www.)?([^/]+)/.*$', '\x01') AS k, AVG(STRLEN(Referer)) AS l, COUNT(*) AS c, MIN(Referer) FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE Referer <> '' GROUP BY k HAVING COUNT(*) > 100000 ORDER BY l DESC, c LIMIT 25;
SELECT SearchEngineID, ClientIP, COUNT(*) AS c, SUM(IsRefresh) AS sumIsRefresh, AVG(ResolutionWidth) AS avgResWidth FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE SearchPhrase <> '' GROUP BY SearchEngineID, ClientIP ORDER BY c DESC, SearchEngineID, ClientIP LIMIT 10;
SELECT WatchID, ClientIP, COUNT(*) AS c, SUM(IsRefresh) AS sumIsRefresh, AVG(ResolutionWidth) AS AvgRsesWidth FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE SearchPhrase <> '' GROUP BY WatchID, ClientIP ORDER BY c, WatchID, ClientIP DESC LIMIT 10;
SELECT ClientIP, ClientIP - 1, ClientIP - 2, ClientIP - 3, COUNT(*) AS c FROM parquet_scan('s3://boilingdata-demo/hits.parquet') GROUP BY ClientIP, ClientIP - 1, ClientIP - 2, ClientIP - 3 ORDER BY c DESC, ClientIP LIMIT 11;
SELECT COUNT(*) AS count FROM ( SELECT SearchPhrase FROM parquet_scan('s3://boilingdata-demo/hits.parquet') GROUP BY SearchPhrase ) AS a ORDER BY count DESC;
SELECT DATE_TRUNC('minute', EventTime::timestamp) AS M, COUNT(*) AS PageViews FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE CounterID = 62 AND EventDate >= '2013-07-14' AND EventDate <= '2013-07-15' AND IsRefresh = 0 AND DontCountHits = 0 GROUP BY DATE_TRUNC('minute', EventTime::timestamp) ORDER BY DATE_TRUNC('minute', EventTime::timestamp) LIMIT 10 OFFSET 1000;
SELECT CounterID, AVG(STRLEN(URL)) AS l, COUNT(*) AS c FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE URL <> '' GROUP BY CounterID HAVING COUNT(*) > 100000 ORDER BY l DESC LIMIT 25;
SELECT * FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE URL LIKE '%google%' ORDER BY EventTime, WatchID LIMIT 1;
Thank you! Yes, let's try these queries. I can also try if I will not forget, but you can also submit the results.
I've created "engineering white paper" with the test results. Link is on front page: https://www.boilingdata.com/
https://www.boilingdata.com/
Suggested by @danthegoodman1 in a discussion at https://clickhouse.com/slack