riccardogabellone commented 4 months ago

Hi! I'm having this kind of issue, already encountered by someone else:

ERROR bad response: Code: 33. DB::Exception: Cannot read all data. Bytes read: 38. Bytes expected: 116.: (at row 1) : While executing BinaryRowInputFormat. (CANNOT_READ_ALL_DATA) (version 24.2.2.16288 (official build))

I cannot figure out what is going wrong...

Here are my rust code:

use std::{
    net::{Ipv4Addr, Ipv6Addr},
    u64,
};
use serde_repr::{Deserialize_repr, Serialize_repr};
use time::OffsetDateTime;

#[derive(Debug, serde::Serialize, serde::Deserialize, clickhouse::Row)]
pub struct MyLogMessage {
    #[serde(with = "clickhouse::serde::time::datetime64::millis")]
    pub timestamp: OffsetDateTime,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub api_key: Option<String>,
    #[serde(skip_serializing_if = "Option::is_none")]
    #[serde(with = "clickhouse::serde::ipv4::option")]
    pub ip4: Option<Ipv4Addr>,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub ip6: Option<Ipv6Addr>,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub auth: Option<bool>,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub user_agent: Option<String>,
    pub req_size: u32,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub req_content_type: Option<String>,
    pub content_length: u64,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub content_type: Option<String>,
    pub method: HttpMethod,
    pub host: String,
    pub service: String,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub path: Option<String>,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub query: Option<String>,
    pub status_code: u16,
    pub execution_time_ms: u32,
    pub version: String,
}

#[derive(Debug, Serialize_repr, Deserialize_repr)]
#[repr(u8)]
pub enum HttpMethod {
    _UN = 0,
    GET = 1,
    HEAD = 2,
    POST = 3,
    PUT = 4,
    DELETE = 5,
    CONNECT = 6,
    OPTIONS = 7,
    TRACE = 8,
    PATCH = 9,
}

fn main() {
    let mut inserter = clickhouse
        .inserter::<MyLogMessage>("my_log")
        .unwrap();

    let mut messages = vec![MyLogMessage { ... }, ...];
    while let Some(msg) = &messages.pop() {
        inserter.write(&msg).await.unwrap();
    }

    match inserter.end().await {
        Ok(_) => {
            consumer.commit_consumer_state(CommitMode::Sync).unwrap();
            tracing::info!("ALL MSGs SAVED.");
        }
        Err(e) => {
            tracing::error!("{e}");  // Here is the CANNOT_READ_ALL_DATA DB::Exception
        }
    };
}

And the clickhouse DB:

CREATE TABLE my_log
(
    `timestamp` DateTime64(3),
    `api_key` String,
    `ip4` IPv4,
    `ip6` IPv6,
    `version` String,
    `auth` Bool,
    `user_agent` String,
    `req_size` UInt32,
    `req_content_type` String,
    `content_length` UInt64,
    `content_type` String,
    `method` Enum8('_' = 0, 'GET' = 1, 'HEAD' = 2, 'POST' = 3, 'PUT' = 4, 'DELETE' = 5, 'CONNECT' = 6, 'OPTIONS' = 7, 'TRACE' = 8, 'PATCH' = 9),
    `host` String,
    `service` String,
    `path` String,
    `query` String,
    `status_code` UInt16,
    `execution_time_ms` UInt32
)
ENGINE = MergeTree;

Maybe I missed something from docs?

Could be any of those Option<_> that are not annotated in any way (despite I guess skip_serializing_if should be enough)?

Or, do I need to put Nullable for each column to match rust impl? as I test, CH put default values anyway, so, as long as they are skipped, there should not be any exception, right?

Or, is it better to put serde defaults without Option<_> wrappers?

EDIT: I also tried the wa-37420 feature flag. Same results

riccardogabellone commented 4 months ago

I tested some of MyLogMessage elements built from json raw data:

with this example, it raises CANNOT_READ_ALL_DATA

{
"timestamp":1718746919678
"ip4":2130706433
"user_agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36"
"req_size":0
"content_length":0
"content_type":"image/x-icon"
"method":1
"host":"localhost"
"service":"favicon.ico"
"path":"/"
"status_code":200
"execution_time_ms":0
"version":"HTTP/1.1"
}

with this example, it raises ATTEMPT_TO_READ_AFTER_EOF

{
"timestamp":1718788113783
"api_key":"<my-key>"
"ip4":2130706433
"auth":true
"user_agent":"insomnia/9.2.0"
"req_size":0
"req_content_type":"application/json"
"content_length":4242
"content_type":"application/json"
"method":1
"host":"localhost"
"service":"my-service"
"path":"/"
"query":"f1=true&f2=true"
"status_code":200
"execution_time_ms":77
"version":"HTTP/1.1"
}

riccardogabellone commented 4 months ago

Or, is it better to put serde defaults without Option<_> wrappers?

Ok. I managed it to fix only in this way! the other ones don't seem to work

riccardogabellone commented 4 months ago

maybe, it is better to close the issue only after some of your checks

apsaltis commented 3 months ago

Hi, I'm getting a very similar issue, get this error message: BadResponse("Code: 33. DB::Exception: Cannot read all data. Bytes read: 0. Bytes expected: 1.: (at row 13)\n: While executing BinaryRowInputFormat. (CANNOT_READ_ALL_DATA) (version 24.2.2.16370 (official build))")

FWIW -- trying this against Clickhouse Cloud.

I have changed my struct to make sure there are no options, and all values are populated, even thought the table allows most to be null.

Ideas on where to look would be greatly appreciated.

apsaltis commented 3 months ago

as an update, I was using chrono for DateTime and after seeing this issue and re-reading that time support, I switched over from chrono to time and still have same issue.

riccardogabellone commented 3 months ago

Hi @apsaltis ! If you have any example snippet that can reproduce the actual data structures, please send here. Just to compare with those of mine and with all the attempts I did 😁

apsaltis commented 3 months ago

Hi @riccardogabellone -- here is the data structure:

`#[derive(Row,Debug, Deserialize, Serialize)] struct AiLiteracyRow {

[serde(with = "clickhouse::serde::time::datetime")]

created_on_timestamp: OffsetDateTime,
#[serde(with = "clickhouse::serde::time::datetime")]
last_update_timestamp: OffsetDateTime,
organization_id: String,
user_id: String,
raw_results: String,
ethics_assessed_score: i32,
ethics_self_score_avg: i32,
ethics_self_score_strongly_disagree_count: i32,
ethics_self_score_disagree_count: i32,
ethics_self_score_somewhat_disagree_count: i32,
ethics_self_score_neither_count: i32,
ethics_self_score_somewhat_agree_count: i32,
ethics_self_score_agree_count: i32,
ethics_self_score_strongly_agree_count: i32,
eval_create_assessed_score: i32,
eval_create_self_score_avg: i32,
eval_create_self_score_strongly_disagree_count: i32,
eval_create_self_score_disagree_count: i32,
eval_create_self_score_somewhat_disagree_count: i32,
eval_create_self_score_neither_count: i32,
eval_create_self_score_somewhat_agree_count: i32,
eval_create_self_score_agree_count: i32,
eval_create_self_score_strongly_agree_count: i32,
know_understand_assessed_score: i32,
know_understand_self_score_avg: i32,
know_understand_self_score_strongly_disagree_count: i32,
know_understand_self_score_disagree_count: i32,
know_understand_self_score_somewhat_disagree_count: i32,
know_understand_self_score_neither_count: i32,
know_understand_self_score_somewhat_agree_count: i32,
know_understand_self_score_agree_count: i32,
know_understand_self_score_strongly_agree_count: i32,
use_and_apply_assessed_score: i32,
use_and_apply_self_score_avg: i32,
use_and_apply_self_score_strongly_disagree_count: i32,
use_and_apply_self_score_disagree_count: i32,
use_and_apply_self_score_somewhat_disagree_count: i32,
use_and_apply_self_score_neither_count: i32,
use_and_apply_self_score_somewhat_agree_count: i32,
use_and_apply_self_score_agree_count: i32,
use_and_apply_self_score_strongly_agree_count: i32,
ai_001: i16,
ai_002: i16,
ai_003: i16,
ai_004: i16,
ai_005: i16,
ai_006: i16,
ai_007: i16,
ai_008: i16,
ai_009: i16,
ai_010: i16,
ai_011: i16,
ai_012: i16,
ai_013: i16,
ai_014: i16,
ai_015: i16,
ai_016: i16,
ai_017: i16,
ai_018: i16,
ai_019: i16,
ai_020: i16,
ai_021: i16,
ai_023: i16,
ai_024: i16,
ai_025: i16,
ai_026: i16,
ai_027: i16,
ai_028: u8,
ai_029: u8,
ai_030: i16,
ai_031: i16,
ai_032: i16,
ai_033: i16,
ai_034: i16,
ai_035: i16,
ai_036: i16,
ai_037: i16,
ai_038: i16,
ai_039: i16,
ai_040: i16,
ai_041: i16,
ai_042: i16,
ai_043: i16,
ai_044: i16,
ai_045: i16,
ai_046: i16,
ai_047: i16,
ai_048: i16,
ai_049: i16,
ai_050: i16,
ai_051: i16,
ai_052: i16,
ai_053: i16,
ai_054: i16,
ai_055: i16,
ai_056: i16,
ai_057: i16,
ai_058: i16,
ai_059: i16,
ai_060: i16,
ai_061: i16,
ai_062: i16,
ai_063: i16,
ai_064: i16,
ai_065: i16,
ai_066: i16,
ai_067: i16,
ai_068: i16,
ai_069: i16,
ai_070: i16,
ai_071: i16,
ai_072: i16,
ai_073: i16,
ai_074: i16,
ai_075: i16,
ai_076: i16,
ai_077: i16,
ai_078: i16,
ai_079: i16,
ai_080: i16,
ai_081: i16,
ai_082: i16,
ai_083: i16,
ai_084: i16,
ai_085: i16,
ai_086: i16,
ai_087: i16,
ai_088: i16,
ai_089: i16,
ai_090: i16,
ai_091: i16,
ai_092: i16,
ai_093: i16,
ai_094: i16,
ai_095: i16,
ai_096: i16,
ai_097: i16,
ai_098: i16,
ai_099: i16,
ai_100: i16,
ai_101: i16,
ai_102: i16,
ai_103: i16,
ai_104: i16,
ai_105: i16,
ai_106: i16,
ai_108: i16,
ai_109: i16,
ai_110: i16,
ai_111: i16,
ai_112: i16,
ai_113: i16,
ai_114: i16,
ai_115: i16,
ai_116: i16,
ai_117: i16,
ai_118: i16,
ai_119: i16,
ai_120: i16,
ai_121: i16,
ai_122: i16,
ai_123: i16,
ai_124: i16,
ai_125: i16,
ai_126: i16,
ai_127: String,
ai_128: i16,
ai_129: i16,
ai_130: i16,
ai_131: i16,
ai_132: i16,
ai_133: i16,
ai_134: i16,
ai_135: i16,
ai_136: i16,
ai_137: String,
ai_138: String,
ai_140: String,
ai_141: String,
ai_142: String,
ai_143: u8,
ai_144: String,
ai_145: String,

}` and here is the row populated and serilized as JSON { "created_on_timestamp": 1720024253, "last_update_timestamp": 1720024253, "organization_id": "123", "user_id": "456", "raw_results": "this ends up being a JSON string", "ethics_assessed_score": 0, "ethics_self_score_avg": 2, "ethics_self_score_strongly_disagree_count": 14, "ethics_self_score_disagree_count": 0, "ethics_self_score_somewhat_disagree_count": 0, "ethics_self_score_neither_count": 0, "ethics_self_score_somewhat_agree_count": 0, "ethics_self_score_agree_count": 0, "ethics_self_score_strongly_agree_count": 0, "eval_create_assessed_score": 0, "eval_create_self_score_avg": 2, "eval_create_self_score_strongly_disagree_count": 14, "eval_create_self_score_disagree_count": 0, "eval_create_self_score_somewhat_disagree_count": 0, "eval_create_self_score_neither_count": 0, "eval_create_self_score_somewhat_agree_count": 0, "eval_create_self_score_agree_count": 0, "eval_create_self_score_strongly_agree_count": 0, "know_understand_assessed_score": 2, "know_understand_self_score_avg": 7, "know_understand_self_score_strongly_disagree_count": 50, "know_understand_self_score_disagree_count": 0, "know_understand_self_score_somewhat_disagree_count": 0, "know_understand_self_score_neither_count": 0, "know_understand_self_score_somewhat_agree_count": 0, "know_understand_self_score_agree_count": 0, "know_understand_self_score_strongly_agree_count": 0, "use_and_apply_assessed_score": 0, "use_and_apply_self_score_avg": 2, "use_and_apply_self_score_strongly_disagree_count": 17, "use_and_apply_self_score_disagree_count": 0, "use_and_apply_self_score_somewhat_disagree_count": 0, "use_and_apply_self_score_neither_count": 0, "use_and_apply_self_score_somewhat_agree_count": 0, "use_and_apply_self_score_agree_count": 0, "use_and_apply_self_score_strongly_agree_count": 0, "ai_001": 1, "ai_002": 1, "ai_003": 1, "ai_004": 0, "ai_005": 4, "ai_006": 1, "ai_007": 1, "ai_008": 1, "ai_009": 1, "ai_010": 1, "ai_011": 1, "ai_012": 3, "ai_013": 3, "ai_014": 3, "ai_015": 2, "ai_016": 1, "ai_017": 1, "ai_018": 1, "ai_019": 1, "ai_020": 1, "ai_021": 1, "ai_022": 1, "ai_023": 1, "ai_024": 1, "ai_025": 1, "ai_026": 1, "ai_027": 1, "ai_028": true, "ai_029": false, "ai_030": 1, "ai_031": 1, "ai_032": 1, "ai_033": 1, "ai_034": 1, "ai_035": 1, "ai_036": 1, "ai_037": 0, "ai_038": 0, "ai_039": 2, "ai_040": 1, "ai_041": 1, "ai_042": 1, "ai_043": 1, "ai_044": 1, "ai_045": 1, "ai_046": 1, "ai_047": 1, "ai_048": 1, "ai_049": 1, "ai_050": 1, "ai_051": 3, "ai_052": 4, "ai_053": 0, "ai_054": 1, "ai_055": 1, "ai_056": 1, "ai_057": 1, "ai_058": 1, "ai_059": 1, "ai_060": 1, "ai_061": 1, "ai_062": 1, "ai_063": 1, "ai_064": 1, "ai_065": 1, "ai_066": 1, "ai_067": 1, "ai_068": 1, "ai_069": 1, "ai_070": 1, "ai_071": 4, "ai_072": 4, "ai_073": 0, "ai_074": 0, "ai_075": 0, "ai_076": 0, "ai_077": 4, "ai_078": 4, "ai_079": 1, "ai_080": 1, "ai_081": 1, "ai_082": 1, "ai_083": 1, "ai_084": 1, "ai_085": 1, "ai_086": 1, "ai_087": 1, "ai_088": 1, "ai_089": 1, "ai_090": 1, "ai_091": 1, "ai_092": 1, "ai_093": 1, "ai_094": 1, "ai_095": 1, "ai_096": 1, "ai_097": 1, "ai_098": 1, "ai_099": 1, "ai_100": 1, "ai_101": 1, "ai_102": 1, "ai_103": 1, "ai_104": 1, "ai_105": 1, "ai_106": 1, "ai_108": 1, "ai_109": 1, "ai_110": 0, "ai_111": 0, "ai_112": 4, "ai_113": 1, "ai_114": 1, "ai_115": 1, "ai_116": 1, "ai_117": 1, "ai_118": 1, "ai_119": 1, "ai_120": 1, "ai_121": 1, "ai_122": 1, "ai_123": 1, "ai_124": 0, "ai_125": 0, "ai_126": 0, "ai_127": [4, 2, 5, 3, 1], "ai_128": 0, "ai_129": 0, "ai_130": 4, "ai_131": 3, "ai_132": 0, "ai_133": 4, "ai_134": 4, "ai_135": 1, "ai_136": 0, "ai_137": ["ml", "nlp", "robotics"], "ai_138": "no_knoledge", "ai_140": ["dev_models", "manage_projects"], "ai_141": "adsfasdasdf", "ai_142": "less_than_1", "ai_143": true, "ai_144": "asdfadfadf", "ai_145": "asdfasdf" }

I also just tried with crate feature "wa-37420"

apsaltis commented 3 months ago

just changed the "raw_results" which was a JSON string to be just.a string literal "some_data" and now got back a different error: BadResponse("Code: 33. DB::Exception: Cannot read all data. Bytes read: 0. Bytes expected: 1.: (at row 2)\n: While executing BinaryRowInputFormat. (CANNOT_READ_ALL_DATA) (version 24.2.2.16370 (official build))")

so now row 2 and not 13, not sure where that leads.

riccardogabellone commented 3 months ago

Hi @riccardogabellone -- here is the data structure:

#[derive(Row,Debug, Deserialize, Serialize)] struct AiLiteracyRow { #[serde(with = "clickhouse::serde::time::datetime")] created_on_timestamp: OffsetDateTime, #[serde(with = "clickhouse::serde::time::datetime")] last_update_timestamp: OffsetDateTime, organization_id: String, user_id: String, raw_results: String, ethics_assessed_score: i32, ethics_self_score_avg: i32, ethics_self_score_strongly_disagree_count: i32, ethics_self_score_disagree_count: i32, ethics_self_score_somewhat_disagree_count: i32, ethics_self_score_neither_count: i32, ethics_self_score_somewhat_agree_count: i32, ethics_self_score_agree_count: i32, ethics_self_score_strongly_agree_count: i32, eval_create_assessed_score: i32, eval_create_self_score_avg: i32, eval_create_self_score_strongly_disagree_count: i32, eval_create_self_score_disagree_count: i32, eval_create_self_score_somewhat_disagree_count: i32, eval_create_self_score_neither_count: i32, eval_create_self_score_somewhat_agree_count: i32, eval_create_self_score_agree_count: i32, eval_create_self_score_strongly_agree_count: i32, know_understand_assessed_score: i32, know_understand_self_score_avg: i32, know_understand_self_score_strongly_disagree_count: i32, know_understand_self_score_disagree_count: i32, know_understand_self_score_somewhat_disagree_count: i32, know_understand_self_score_neither_count: i32, know_understand_self_score_somewhat_agree_count: i32, know_understand_self_score_agree_count: i32, know_understand_self_score_strongly_agree_count: i32, use_and_apply_assessed_score: i32, use_and_apply_self_score_avg: i32, use_and_apply_self_score_strongly_disagree_count: i32, use_and_apply_self_score_disagree_count: i32, use_and_apply_self_score_somewhat_disagree_count: i32, use_and_apply_self_score_neither_count: i32, use_and_apply_self_score_somewhat_agree_count: i32, use_and_apply_self_score_agree_count: i32, use_and_apply_self_score_strongly_agree_count: i32, ai_001: i16, ai_002: i16, ai_003: i16, ai_004: i16, ai_005: i16, ai_006: i16, ai_007: i16, ai_008: i16, ai_009: i16, ai_010: i16, ai_011: i16, ai_012: i16, ai_013: i16, ai_014: i16, ai_015: i16, ai_016: i16, ai_017: i16, ai_018: i16, ai_019: i16, ai_020: i16, ai_021: i16, ai_023: i16, ai_024: i16, ai_025: i16, ai_026: i16, ai_027: i16, ai_028: u8, ai_029: u8, ai_030: i16, ai_031: i16, ai_032: i16, ai_033: i16, ai_034: i16, ai_035: i16, ai_036: i16, ai_037: i16, ai_038: i16, ai_039: i16, ai_040: i16, ai_041: i16, ai_042: i16, ai_043: i16, ai_044: i16, ai_045: i16, ai_046: i16, ai_047: i16, ai_048: i16, ai_049: i16, ai_050: i16, ai_051: i16, ai_052: i16, ai_053: i16, ai_054: i16, ai_055: i16, ai_056: i16, ai_057: i16, ai_058: i16, ai_059: i16, ai_060: i16, ai_061: i16, ai_062: i16, ai_063: i16, ai_064: i16, ai_065: i16, ai_066: i16, ai_067: i16, ai_068: i16, ai_069: i16, ai_070: i16, ai_071: i16, ai_072: i16, ai_073: i16, ai_074: i16, ai_075: i16, ai_076: i16, ai_077: i16, ai_078: i16, ai_079: i16, ai_080: i16, ai_081: i16, ai_082: i16, ai_083: i16, ai_084: i16, ai_085: i16, ai_086: i16, ai_087: i16, ai_088: i16, ai_089: i16, ai_090: i16, ai_091: i16, ai_092: i16, ai_093: i16, ai_094: i16, ai_095: i16, ai_096: i16, ai_097: i16, ai_098: i16, ai_099: i16, ai_100: i16, ai_101: i16, ai_102: i16, ai_103: i16, ai_104: i16, ai_105: i16, ai_106: i16, ai_108: i16, ai_109: i16, ai_110: i16, ai_111: i16, ai_112: i16, ai_113: i16, ai_114: i16, ai_115: i16, ai_116: i16, ai_117: i16, ai_118: i16, ai_119: i16, ai_120: i16, ai_121: i16, ai_122: i16, ai_123: i16, ai_124: i16, ai_125: i16, ai_126: i16, ai_127: String, ai_128: i16, ai_129: i16, ai_130: i16, ai_131: i16, ai_132: i16, ai_133: i16, ai_134: i16, ai_135: i16, ai_136: i16, ai_137: String, ai_138: String, ai_140: String, ai_141: String, ai_142: String, ai_143: u8, ai_144: String, ai_145: String, } and here is the row populated and serilized as JSON { "created_on_timestamp": 1720024253, "last_update_timestamp": 1720024253, "organization_id": "123", "user_id": "456", "raw_results": "this ends up being a JSON string", "ethics_assessed_score": 0, "ethics_self_score_avg": 2, "ethics_self_score_strongly_disagree_count": 14, "ethics_self_score_disagree_count": 0, "ethics_self_score_somewhat_disagree_count": 0, "ethics_self_score_neither_count": 0, "ethics_self_score_somewhat_agree_count": 0, "ethics_self_score_agree_count": 0, "ethics_self_score_strongly_agree_count": 0, "eval_create_assessed_score": 0, "eval_create_self_score_avg": 2, "eval_create_self_score_strongly_disagree_count": 14, "eval_create_self_score_disagree_count": 0, "eval_create_self_score_somewhat_disagree_count": 0, "eval_create_self_score_neither_count": 0, "eval_create_self_score_somewhat_agree_count": 0, "eval_create_self_score_agree_count": 0, "eval_create_self_score_strongly_agree_count": 0, "know_understand_assessed_score": 2, "know_understand_self_score_avg": 7, "know_understand_self_score_strongly_disagree_count": 50, "know_understand_self_score_disagree_count": 0, "know_understand_self_score_somewhat_disagree_count": 0, "know_understand_self_score_neither_count": 0, "know_understand_self_score_somewhat_agree_count": 0, "know_understand_self_score_agree_count": 0, "know_understand_self_score_strongly_agree_count": 0, "use_and_apply_assessed_score": 0, "use_and_apply_self_score_avg": 2, "use_and_apply_self_score_strongly_disagree_count": 17, "use_and_apply_self_score_disagree_count": 0, "use_and_apply_self_score_somewhat_disagree_count": 0, "use_and_apply_self_score_neither_count": 0, "use_and_apply_self_score_somewhat_agree_count": 0, "use_and_apply_self_score_agree_count": 0, "use_and_apply_self_score_strongly_agree_count": 0, "ai_001": 1, "ai_002": 1, "ai_003": 1, "ai_004": 0, "ai_005": 4, "ai_006": 1, "ai_007": 1, "ai_008": 1, "ai_009": 1, "ai_010": 1, "ai_011": 1, "ai_012": 3, "ai_013": 3, "ai_014": 3, "ai_015": 2, "ai_016": 1, "ai_017": 1, "ai_018": 1, "ai_019": 1, "ai_020": 1, "ai_021": 1, "ai_022": 1, "ai_023": 1, "ai_024": 1, "ai_025": 1, "ai_026": 1, "ai_027": 1, "ai_028": true, "ai_029": false, "ai_030": 1, "ai_031": 1, "ai_032": 1, "ai_033": 1, "ai_034": 1, "ai_035": 1, "ai_036": 1, "ai_037": 0, "ai_038": 0, "ai_039": 2, "ai_040": 1, "ai_041": 1, "ai_042": 1, "ai_043": 1, "ai_044": 1, "ai_045": 1, "ai_046": 1, "ai_047": 1, "ai_048": 1, "ai_049": 1, "ai_050": 1, "ai_051": 3, "ai_052": 4, "ai_053": 0, "ai_054": 1, "ai_055": 1, "ai_056": 1, "ai_057": 1, "ai_058": 1, "ai_059": 1, "ai_060": 1, "ai_061": 1, "ai_062": 1, "ai_063": 1, "ai_064": 1, "ai_065": 1, "ai_066": 1, "ai_067": 1, "ai_068": 1, "ai_069": 1, "ai_070": 1, "ai_071": 4, "ai_072": 4, "ai_073": 0, "ai_074": 0, "ai_075": 0, "ai_076": 0, "ai_077": 4, "ai_078": 4, "ai_079": 1, "ai_080": 1, "ai_081": 1, "ai_082": 1, "ai_083": 1, "ai_084": 1, "ai_085": 1, "ai_086": 1, "ai_087": 1, "ai_088": 1, "ai_089": 1, "ai_090": 1, "ai_091": 1, "ai_092": 1, "ai_093": 1, "ai_094": 1, "ai_095": 1, "ai_096": 1, "ai_097": 1, "ai_098": 1, "ai_099": 1, "ai_100": 1, "ai_101": 1, "ai_102": 1, "ai_103": 1, "ai_104": 1, "ai_105": 1, "ai_106": 1, "ai_108": 1, "ai_109": 1, "ai_110": 0, "ai_111": 0, "ai_112": 4, "ai_113": 1, "ai_114": 1, "ai_115": 1, "ai_116": 1, "ai_117": 1, "ai_118": 1, "ai_119": 1, "ai_120": 1, "ai_121": 1, "ai_122": 1, "ai_123": 1, "ai_124": 0, "ai_125": 0, "ai_126": 0, "ai_127": [4, 2, 5, 3, 1], "ai_128": 0, "ai_129": 0, "ai_130": 4, "ai_131": 3, "ai_132": 0, "ai_133": 4, "ai_134": 4, "ai_135": 1, "ai_136": 0, "ai_137": ["ml", "nlp", "robotics"], "ai_138": "no_knoledge", "ai_140": ["dev_models", "manage_projects"], "ai_141": "adsfasdasdf", "ai_142": "less_than_1", "ai_143": true, "ai_144": "asdfadfadf", "ai_145": "asdfasdf" }

I also just tried with crate feature "wa-37420"

at first glance, I note ai_137 and ai_140 are String while in JSON I see arrays ...anyway, I didn't try with clickhouse arrays yet

riccardogabellone commented 3 months ago

how is the corresponding clickhouse table declared?

apsaltis commented 3 months ago

Sorry, should have included that:

CREATE TABLE default .ai_literacy ( created_on_timestampDateTime64(3), last_update_timestampDateTime64(3), organization_idNullable(String), user_idNullable(String), raw_resultsNullable(String), ethics_assessed_scoreNullable(Int32), ethics_self_score_avgNullable(Int32), ethics_self_score_strongly_disagree_countNullable(Int32), ethics_self_score_disagree_countNullable(Int32), ethics_self_score_somewhat_disagree_countNullable(Int32), ethics_self_score_neither_countNullable(Int32), ethics_self_score_somewhat_agree_countNullable(Int32), ethics_self_score_agree_countNullable(Int32), ethics_self_score_strongly_agree_countNullable(Int32), eval_create_assessed_scoreNullable(Int32), eval_create_self_score_avgNullable(Int32), eval_create_self_score_strongly_disagree_countNullable(Int32), eval_create_self_score_disagree_countNullable(Int32), eval_create_self_score_somewhat_disagree_countNullable(Int32), eval_create_self_score_neither_countNullable(Int32), eval_create_self_score_somewhat_agree_countNullable(Int32), eval_create_self_score_agree_countNullable(Int32), eval_create_self_score_strongly_agree_countNullable(Int32), know_understand_assessed_scoreNullable(Int32), know_understand_self_score_avgNullable(Int32), know_understand_self_score_strongly_disagree_countNullable(Int32), know_understand_self_score_disagree_countNullable(Int32), know_understand_self_score_somewhat_disagree_countNullable(Int32), know_understand_self_score_neither_countNullable(Int32), know_understand_self_score_somewhat_agree_countNullable(Int32), know_understand_self_score_agree_countNullable(Int32), know_understand_self_score_strongly_agree_countNullable(Int32), use_and_apply_assessed_scoreNullable(Int32), use_and_apply_self_score_avgNullable(Int32), use_and_apply_self_score_strongly_disagree_countNullable(Int32), use_and_apply_self_score_disagree_countNullable(Int32), use_and_apply_self_score_somewhat_disagree_countNullable(Int32), use_and_apply_self_score_neither_countNullable(Int32), use_and_apply_self_score_somewhat_agree_countNullable(Int32), use_and_apply_self_score_agree_countNullable(Int32), use_and_apply_self_score_strongly_agree_countNullable(Int32), ai_001Nullable(Int16), ai_002Nullable(Int16), ai_003Nullable(Int16), ai_004Nullable(Int16), ai_005Nullable(Int16), ai_006Nullable(Int16), ai_007Nullable(Int16), ai_008Nullable(Int16), ai_009Nullable(Int16), ai_010Nullable(Int16), ai_011Nullable(Int16), ai_012Nullable(Int16), ai_013Nullable(Int16), ai_014Nullable(Int16), ai_015Nullable(Int16), ai_016Nullable(Int16), ai_017Nullable(Int16), ai_018Nullable(Int16), ai_019Nullable(Int16), ai_020Nullable(Int16), ai_021Nullable(Int16), ai_023Nullable(Int16), ai_024Nullable(Int16), ai_025Nullable(Int16), ai_026Nullable(Int16), ai_027Nullable(Int16), ai_028Nullable(UInt8), ai_029Nullable(UInt8), ai_030Nullable(Int16), ai_031Nullable(Int16), ai_032Nullable(Int16), ai_033Nullable(Int16), ai_034Nullable(Int16), ai_035Nullable(Int16), ai_036Nullable(Int16), ai_037Nullable(Int16), ai_038Nullable(Int16), ai_039Nullable(Int16), ai_040Nullable(Int16), ai_041Nullable(Int16), ai_042Nullable(Int16), ai_043Nullable(Int16), ai_044Nullable(Int16), ai_045Nullable(Int16), ai_046Nullable(Int16), ai_047Nullable(Int16), ai_048Nullable(Int16), ai_049Nullable(Int16), ai_050Nullable(Int16), ai_051Nullable(Int16), ai_052Nullable(Int16), ai_053Nullable(Int16), ai_054Nullable(Int16), ai_055Nullable(Int16), ai_056Nullable(Int16), ai_057Nullable(Int16), ai_058Nullable(Int16), ai_059Nullable(Int16), ai_060Nullable(Int16), ai_061Nullable(Int16), ai_062Nullable(Int16), ai_063Nullable(Int16), ai_064Nullable(Int16), ai_065Nullable(Int16), ai_066Nullable(Int16), ai_067Nullable(Int16), ai_068Nullable(Int16), ai_069Nullable(Int16), ai_070Nullable(Int16), ai_071Nullable(Int16), ai_072Nullable(Int16), ai_073Nullable(Int16), ai_074Nullable(Int16), ai_075Nullable(Int16), ai_076Nullable(Int16), ai_077Nullable(Int16), ai_078Nullable(Int16), ai_079Nullable(Int16), ai_080Nullable(Int16), ai_081Nullable(Int16), ai_082Nullable(Int16), ai_083Nullable(Int16), ai_084Nullable(Int16), ai_085Nullable(Int16), ai_086Nullable(Int16), ai_087Nullable(Int16), ai_088Nullable(Int16), ai_089Nullable(Int16), ai_090Nullable(Int16), ai_091Nullable(Int16), ai_092Nullable(Int16), ai_093Nullable(Int16), ai_094Nullable(Int16), ai_095Nullable(Int16), ai_096Nullable(Int16), ai_097Nullable(Int16), ai_098Nullable(Int16), ai_099Nullable(Int16), ai_100Nullable(Int16), ai_101Nullable(Int16), ai_102Nullable(Int16), ai_103Nullable(Int16), ai_104Nullable(Int16), ai_105Nullable(Int16), ai_106Nullable(Int16), ai_108Nullable(Int16), ai_109Nullable(Int16), ai_110Nullable(Int16), ai_111Nullable(Int16), ai_112Nullable(Int16), ai_113Nullable(Int16), ai_114Nullable(Int16), ai_115Nullable(Int16), ai_116Nullable(Int16), ai_117Nullable(Int16), ai_118Nullable(Int16), ai_119Nullable(Int16), ai_120Nullable(Int16), ai_121Nullable(Int16), ai_122Nullable(Int16), ai_123Nullable(Int16), ai_124Nullable(Int16), ai_125Nullable(Int16), ai_126Nullable(Int16), ai_127Nullable(String), ai_128Nullable(Int16), ai_129Nullable(Int16), ai_130Nullable(Int16), ai_131Nullable(Int16), ai_132Nullable(Int16), ai_133Nullable(Int16), ai_134Nullable(Int16), ai_135Nullable(Int16), ai_136Nullable(Int16), ai_137Nullable(String), ai_138Nullable(String), ai_140Nullable(String), ai_141Nullable(String), ai_142Nullable(String), ai_143Nullable(UInt8), ai_144Nullable(String), ai_145Nullable(String) ) ENGINE = SharedMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}') ORDER BY (created_on_timestamp, last_update_timestamp) SETTINGS index_granularity = 8192

riccardogabellone commented 3 months ago

Some checks I'd do:

are those ai_* JSON array values actually strings?
try to remove all Nullable from clickhouse (a temp table) and fill them with some default value that may suitable to emulate null value
your timestamps are DateTime64(3) so I would try with this annotation instead: #[serde(with = "clickhouse::serde::time::datetime64::millis")]

apsaltis commented 3 months ago

Hi @riccardogabellone

RE: the ai_* JSON array values, are actually Vec in the incoming data and then they are converted like this: ai_137: response.answers.ai_137.join(", ")
RE: Nullable -- changed the table to have everything as NOT NUL, as that is reality of the system anyways.
RE: Timestamps -- changed to #[serde(with = "clickhouse::serde::time::datetime64::millis")] that certainly has an impact.

Everything is working right now like a charm. For certian if I have the wrong serde for the datetime it blows up as expected.

Thanks for all the help @riccardogabellone

apsaltis commented 3 months ago

Hi @riccardogabellone , I've run into this again, not a show stopper, but more connivence. I have a table that has an Array(UUID) when I then try and serialize a Vec then it fails. I cannot use the #[serde(with = "clickhouse::serde::uuid")] as that does not expect a Vec but just a Uuid.

For now, I changed my table to be an Array(LC(String)) and it works just fine with a Vec

Thoughts on the Vec?

Thanks, Andrew

loyd commented 3 months ago

@apsaltis

Hm, I think it's easy to implement clickhouse::serde::uuid::vec in the same way as serde::uuid::option.

It looks clumsy; I'm sure there is a better approach here by changing

    pub fn serialize<S>(uuid: &Uuid, serializer: S) -> Result<S::Ok, S::Error>

into something like

    pub fn serialize<U: Wrapper<Uuid>, S>(uuid: &U, serializer: S) -> Result<S::Ok, S::Error>

apsaltis commented 3 months ago

Sounds good @loyd -- let me take a stab at implementing this and submit a pr.

keltia commented 3 months ago

Hello, I have a similar issue here as well, here is what I sent to clickhouse and they said it was more likely an issue in the rust client. Do you want a separate issue @loyd ?

Company or project name

ACUTE Project, plenty of data inside the database (5.6B records so far).

Describe the unexpected behaviour

I'm trying to insert a few records into a table with the Rust client (https://docs.rs/clickhouse) and it generates the following error every time, usually on the 3rd or 4th records (quite small).

How to reproduce

(version 24.5.3.5 (official build))

Using the HTTP interface through the Rust client.

Non-default settings, if any

        let r = r##"
        CREATE TABLE ids (
    drone_id VARCHAR,
    callsign VARCHAR,
    journey INT,
    en_id VARCHAR,
) ENGINE = Memory
##;

The code later uses this to insert records:

        #[derive(Clone, Debug, Serialize, Deserialize, Row)]
        struct Tc {
            #[serde(skip_deserializing)]
            en_id: String,
            journey: u32,
            drone_id: String,
            callsign: String,
        }

        let total = dbh.query("SELECT count() FROM today_close").fetch_one::<usize>().await?;

        let r = format!(r##"
    SELECT
      journey,
      drone_id,
      callsign,
    FROM today_close
    WHERE
      dist_drone_plane < 1852
    GROUP BY ALL
        "##);

        trace!("Fetch close encounters from today_close.");
        let all = dbh.query(&r).fetch_all::<Tc>().await?;

        if all.len() == 0 {
            return Err(eyre!("No encounters found out of {total}").into());
        }

        trace!("Insert updated records.");
        // Insert the records
        //
        let mut batch = dbh.insert("acute.ids")?;

        trace!("Add en_id.");
        all.iter()
            .enumerate()
            .for_each(|(id, elem): (usize, &Tc)| {
                let journey = elem.journey;
                let elem = Tc {
                    en_id: format!("{}-{}-{}-{}", site, day_name, journey, id),
                    journey: elem.journey,
                    drone_id: elem.drone_id.clone(),
                    callsign: elem.callsign.clone(),
                };
                debug!("{elem:?}");                   // DISPLAYS THE DEBUG LINES BELOW
                block_on(async { batch.write(&elem).await.unwrap(); });
            });
        let _ = batch.end().await?;            // CRASH IS HERE -- LINE 358

         retrace(v): process_data::cmds::distances::planes::compute::insert_ids{day_name="20231128", site="BRU"}
          TRACE process_data::cmds::distances::planes::compute Fetch close encounters from today_close.
          TRACE process_data::cmds::distances::planes::compute Insert updated records.
          TRACE process_data::cmds::distances::planes::compute Add en_id.
          DEBUG process_data::cmds::distances::planes::compute Tc { en_id: "BRU-20231128-37913-0", journey: 37913, drone_id: "687CKAU0011JYP", callsign: "BEL8DK" }
          DEBUG process_data::cmds::distances::planes::compute Tc { en_id: "BRU-20231128-37907-1", journey: 37907, drone_id: "F4XF82376006N14Q", callsign: "BIRD380" }
          DEBUG process_data::cmds::distances::planes::compute Tc { en_id: "BRU-20231128-37907-2", journey: 37907, drone_id: "F4XF82376006N14Q", callsign: "RYR8XL" }
          DEBUG process_data::cmds::distances::planes::compute Tc { en_id: "BRU-20231128-37907-3", journey: 37907, drone_id: "F4XF82376006N14Q", callsign: "SKEY420" }
          DEBUG process_data::cmds::distances::planes::compute Tc { en_id: "BRU-20231128-37907-4", journey: 37907, drone_id: "F4XF82376006N14Q", callsign: "BEL5WG" }
          DEBUG process_data::cmds::distances::planes::compute Tc { en_id: "BRU-20231128-37907-5", journey: 37907, drone_id: "F4XF82376006N14Q", callsign: "SKEY611" }
          DEBUG process_data::cmds::distances::planes::compute Tc { en_id: "BRU-20231128-37907-6", journey: 37907, drone_id: "F4XF82376006N14Q", callsign: "AEE2BR" }
         close(v): process_data::cmds::distances::planes::compute::insert_ids{day_name="20231128", site="BRU"}
       post_close: process_data::cmds::distances::planes::compute::select_encounters{self=PlaneDistance { name: "BRU", loc: Location { code: "9F26RC22+22", hash: Some("u150upggr"), lat: 50.8, lon: 4.4 }, date: 2023-11-28T00:00:00Z, distance: 70.0, separation: 5500.0, template: None }}
       close(v): process_data::cmds::distances::planes::compute::select_encounters{self=PlaneDistance { name: "BRU", loc: Location { code: "9F26RC22+22", hash: Some("u150upggr"), lat: 50.8, lon: 4.4 }, date: 2023-11-28T00:00:00Z, distance: 70.0, separation: 5500.0, template: None }}
     post_close: process_data::cmds::distances::planes::compute::run{self=PlaneDistance { name: "BRU", loc: Location { code: "9F26RC22+22", hash: Some("u150upggr"), lat: 50.8, lon: 4.4 }, date: 2023-11-28T00:00:00Z, distance: 70.0, separation: 5500.0, template: None }}
     close(v): process_data::cmds::distances::planes::compute::run{self=PlaneDistance { name: "BRU", loc: Location { code: "9F26RC22+22", hash: Some("u150upggr"), lat: 50.8, lon: 4.4 }, date: 2023-11-28T00:00:00Z, distance: 70.0, separation: 5500.0, template: None }}
   post_close: process_data::cmds::distances::planes::planes_calculation{opts=PlanesOpts { date: Day { date: "2023-11-28" }, name: "BRU", distance: 70.0, separation: 5500.0 }}
    TRACE process_data::cmds::distances::planes All stats: [Err(bad response: Code: 33. DB::Exception: Cannot read all data. Bytes read: 14. Bytes expected: 54.: (at row 3)
   : While executing BinaryRowInputFormat. (CANNOT_READ_ALL_DATA) (version 24.5.3.5 (official build))

   Location:
       process-data/src/cmds/distances/planes/compute.rs:358:17)]
Task failed: bad response: Code: 33. DB::Exception: Cannot read all data. Bytes read: 14. Bytes expected: 54.: (at row 3)
: While executing BinaryRowInputFormat. (CANNOT_READ_ALL_DATA) (version 24.5.3.5 (official build))

I tried with Client::insert and the inserter (Client::Inserter).

Queries to run that lead to unexpected result

Expected behavior A clear and concise description of what you expected to happen.

Error message and/or stacktrace

/var/log/clickhouse-server/clickhouse-server.err.log:

2024.07.22 16:09:14.002843 [ 163639 ] {06fe1319-1575-4a85-810c-ef462cad190d} <Error> DynamicQueryHandler: Code: 33. DB::Exception: Cannot read all data. Bytes read: 14. Bytes expected
: 54.: (at row 3)
: While executing BinaryRowInputFormat. (CANNOT_READ_ALL_DATA), Stack trace (when copying this message, always include the lines below):

0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000c5c527b
1. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x00000000077960ec
2. DB::Exception::Exception<unsigned long&, String>(int, FormatStringHelperImpl<std::type_identity<unsigned long&>::type, std::type_identity<String>::type>, unsigned long&, String&&)
@ 0x0000000008f51b0b
3. DB::ReadBuffer::readStrict(char*, unsigned long) @ 0x000000000c63be89
4. DB::SerializationString::deserializeBinary(DB::IColumn&, DB::ReadBuffer&, DB::FormatSettings const&) const @ 0x000000000fd520d6
5. DB::BinaryFormatReader<false>::readField(DB::IColumn&, std::shared_ptr<DB::IDataType const> const&, std::shared_ptr<DB::ISerialization const> const&, bool, String const&) @ 0x00000
00011f8f676
6. DB::RowInputFormatWithNamesAndTypes::readRow(std::vector<COW<DB::IColumn>::mutable_ptr<DB::IColumn>, std::allocator<COW<DB::IColumn>::mutable_ptr<DB::IColumn>>>&, DB::RowReadExtens
ion&) @ 0x0000000011f941c0
7. DB::IRowInputFormat::read() @ 0x0000000011f751fc
8. DB::IInputFormat::generate() @ 0x0000000011f1b156
9. DB::ISource::tryGenerate() @ 0x0000000011ef7ff5
10. DB::ISource::work() @ 0x0000000011ef7a82
11. DB::ExecutionThreadContext::executeTask() @ 0x0000000011f11907
12. DB::PipelineExecutor::executeStepImpl(unsigned long, std::atomic<bool>*) @ 0x0000000011f061f0
13. DB::PipelineExecutor::execute(unsigned long, bool) @ 0x0000000011f05682
14. DB::CompletedPipelineExecutor::execute() @ 0x0000000011f03f92
15. DB::executeQuery(DB::ReadBuffer&, DB::WriteBuffer&, bool, std::shared_ptr<DB::Context>, std::function<void (DB::QueryResultDetails const&)>, DB::QueryFlags, std::optional<DB::Form
atSettings> const&, std::function<void (DB::IOutputFormat&, String const&, std::shared_ptr<DB::Context const> const&, std::optional<DB::FormatSettings> const&)>) @ 0x0000000010e321af
16. DB::HTTPHandler::processQuery(DB::HTTPServerRequest&, DB::HTMLForm&, DB::HTTPServerResponse&, DB::HTTPHandler::Output&, std::optional<DB::CurrentThread::QueryScope>&, StrongTypede
f<unsigned long, ProfileEvents::EventTag> const&) @ 0x0000000011e23534
17. DB::HTTPHandler::handleRequest(DB::HTTPServerRequest&, DB::HTTPServerResponse&, StrongTypedef<unsigned long, ProfileEvents::EventTag> const&) @ 0x0000000011e287c5
18. DB::HTTPServerConnection::run() @ 0x0000000011ea5f63
19. Poco::Net::TCPServerConnection::start() @ 0x0000000014767fa7
20. Poco::Net::TCPServerDispatcher::run() @ 0x0000000014768439
21. Poco::PooledThread::run() @ 0x000000001485e8a1
22. Poco::ThreadImpl::runnableEntry(void*) @ 0x000000001485ce7d
23. ? @ 0x00007f1241bdaac3
24. ? @ 0x00007f1241c6c850
 (version 24.5.3.5 (official build))

Additional context

Having async_insert set to 1 does not change anything.

I don't think I can use a single INSERT INTO SQL statement with the data as parameter, can I?

keltia commented 3 months ago

I found what is bothering the client: the serde directive skip_deserializing. If I remove it, it works.

loyd commented 3 months ago

@keltia, would you like to open a separate issue? The client should generate a different subset of field names for serialize and deserialize processes. Now, it skips it in both cases if any skip_serializing and skip_deserializing are provided instead of separate handling. It's a bug and can be easily fixed.

ClickHouse / clickhouse-rs

CANNOT_READ_ALL_DATA when calling end() #109

[serde(with = "clickhouse::serde::time::datetime")]