PNixx / clickhouse-activerecord

A Ruby database ActiveRecord driver for ClickHouse
MIT License
197 stars 99 forks source link

Using schema_format = ruby fails with a single ORDER BY column #163

Closed dbackeus closed 4 weeks ago

dbackeus commented 1 month ago

Given the following migration:

options = "MergeTree PARTITION BY toYYYYMM(timestamp) ORDER BY timestamp"
create_table :events, id: false, force: :cascade, options: do |t|
  t.datetime :timestamp, null: false
end

The generated schema.rb file looks like:

create_table "events", primary_key: "timestamp", id: :datetime, precision: nil, options: "MergeTree PARTITION BY toYYYYMM(timestamp) ORDER BY timestamp SETTINGS index_granularity = 8192", force: :cascade do |t|
end

When attempting to load this schema it fails with:

ActiveRecord::ActiveRecordError: Response code: 404: (ActiveRecord::ActiveRecordError)
Code: 47. DB::Exception: Missing columns: 'timestamp' while processing query: 'toYYYYMM(timestamp)', required columns: 'timestamp' 'timestamp'. (UNKNOWN_IDENTIFIER) (version 24.8.4.13 (official build))

It appears that the timestamp column needs to be explicitly defined inside of the block. It's not enough that it's declared as primary_key: "...".

The issue doesn't occur when ordering by multiple columns (ie. composite primary keys). Also the issue doesn't occur when using schema_format = :sql.

PNixx commented 4 weeks ago

Fixed in v1.1.3

dbackeus commented 4 weeks ago

@PNixx thanks for looking into this, but unfortunately the new version did not fix this issue for me.

I'm no longer getting any primary_key: in the dump but I'm still not getting the timestamp columns added inside the create_table statements. Here's the diff I'm getting with the new version:

Screenshot 2024-09-28 at 12 23 23

Both the email_sg_events and web_clickstream_events tables are missing timestamp in the dump.

I tried both just running db:schema:dump on my existing database and dropping and re-migrating the database from scratch, both result in the same diff.

In the # SQL: CREATE TABLE ... comment the column does exist as expected, but not in the DSL output. Full email_sg_events section:

  # TABLE: email_sg_events
  # SQL: CREATE TABLE email_sg_events ( `uuid` Nullable(String), `email` Nullable(String), `event` Nullable(String), `sg_message_id` Nullable(String), `url` Nullable(String), `url_offset` Nullable(String), `ip` Nullable(String), `useragent` Nullable(String), `timestamp` DateTime, `reason` Nullable(String) ) ENGINE = MergeTree PARTITION BY toYYYYMM(timestamp) ORDER BY timestamp SETTINGS index_granularity = 8192
  create_table "email_sg_events", id: false, options: "MergeTree PARTITION BY toYYYYMM(timestamp) ORDER BY timestamp SETTINGS index_granularity = 8192", force: :cascade do |t|
    t.string "uuid"
    t.string "email"
    t.string "event"
    t.string "sg_message_id"
    t.string "url"
    t.string "url_offset"
    t.string "ip"
    t.string "useragent"
    t.string "reason"
  end