ClickHouse / clickhouse-academy

ClickHouse Academy training and certification
Other
18 stars 3 forks source link

Lab 10.1: ReplacingMergeTree - incorrect use of `toDate` in bullet point 2 #7

Open kyrylo opened 2 months ago

kyrylo commented 2 months ago

The dataset from https://learnclickhouse.s3.us-east-2.amazonaws.com/datasets/mortgage_rates.csv has the following format:

date,variable,fixed,bank
29/02/2004,5.02,4.9,4

Bullet point 2 suggests using toDate to convert 29/02/2004:

INSERT INTO rates_monthly
    SELECT 
        toDate(date) AS month, # <~~~~~~~~~~~~~~~~ WRONG ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        variable,
        fixed,
        bank
    FROM s3(
        'https://learnclickhouse.s3.us-east-2.amazonaws.com/datasets/mortgage_rates.csv',
        'CSVWithNames');

When I run it, my table gets seeded with only 1 record (instead of the expected 220):

SELECT *
FROM rates_monthly

Query id: d8f6743c-49a7-41fa-acc0-2d782f89abf1

   ┌──────month─┬─variable─┬─fixed─┬─bank─┐
1. │ 1970-01-01 │     2.98 │  1.91 │    1 │
   └────────────┴──────────┴───────┴──────┘

The problem is that toDate doesn't know how to parse 29/02/2004.

The solution is also wrong: https://github.com/ClickHouse/clickhouse-academy/blob/a73fa2bfdc266b4894b40f59fe64fd5dcb27ceec/developer/10_deleting_and_updating_data/lab_10.1.sql#L12-L20

Instead, I used parseDateTime(date, '%d/%m/%Y') to parse the date correctly.

ClickHouse v24.3.2

dalei2019 commented 3 weeks ago

Thanks, Man! Also u can use:parseDateTime64BestEffort