Closed rtkaratekid closed 12 months ago
When trying with insert
instead of inserter
let mut inserted_so_far = 0;
for i in 1..((total_rows_to_insert/batch_size) + 1) {
let mut insert = client.insert("dragonfly")?;
for j in 1..batch_size+1 {
insert.write(&DragonflyRow::rand_new()).await?;
inserted_so_far = i * j;
}
insert.end().await?;
thread::sleep(time::Duration::from_secs(2)); // sleep two seconds to not overwhelm clickhouse
}
The error:
Error: BadResponse("Code: 241. DB::Exception: Memory limit (for query) exceeded: would use 4.00 EiB (attempt to allocate chunk of 4611686018429479655 bytes), maximum: 9.31 GiB: While executing BinaryRowInputFormat. (MEMORY_LIMIT_EXCEEDED) (version 21.11.3.6 (official build))")
Thanks for reporting. Which version of CH do you use (upd: see 21.11.3.6, sorry)? Also, how DragonflyRow
looks like or, at least, bytes_in_billion_rows
?
I haven't faced it before, even using the inserter up to 300k/s per thread.
I'm always happy to help/contribute to open source :)
Yeah so I actually just tried one simple insert, which I know has worked in the past
let client = Client::default().with_url("http://localhost:8123");
let mut insert = client.insert("dragonfly")?;
insert.write(&DragonflyRow::rand_new()).await?;
insert.end().await?;
And actually got an error
DB::Exception: Memory limit (for query) exceeded: would use 64.00 PiB (attempt to allocate chunk of 72057594038284184 bytes), maximum: 9.31 GiB: While executing BinaryRowInputFormat. (MEMORY_LIMIT_EXCEEDED) (version 21.11.3.6 (official build)
This actually makes me suspect that it's clickhouse somehow? Because this single insert has definitely worked in the past. Not 100% what that issue might be though. I just rebuilt my vm, reinstalled clickhouse and created the table (this is not a cluster), so I'm not sure if there's any setting that should be changed from vanilla or not.
Edit to add: when I restart the clickhouse-server and then look at htop, I see and initial entry for one or two processes, but over a few seconds they seem to fork or reproduce and spawn new processes and/or threads. Perhaps that's the issue, but since this is a fresh install I'm not sure why that would happen or if that would be the cause of this issue. I also just re-verified that I can insert data using clickhouse-client.
DragonflyRow looks like this. It also has a new
method that just populates the fields with random values.
#[derive(Debug, Row, Serialize, Deserialize)]
struct DragonflyRow {
timestamp: String,
alert_id: i64,
alert_description: String,
alert_msg: String,
alert_severity: i64,
app_protocol: String,
client_to_server_duration: u64,
client_to_server_packet_count: u64,
client_to_server_bytes: u64,
client_to_server_tcp_flags: u32,
client_to_server_entropy: f32,
server_to_client_duration: u64,
server_to_client_packet_count: u64,
server_to_client_bytes: u64,
server_to_client_tcp_flags: u32,
server_to_client_entropy: f32,
community_id: String,
customer: String,
ip_version: u8,
dest_hostname: String,
dest_ip: String,
dest_port: u16,
dest_mac: String,
src_hostname: String,
src_ip: String,
src_port: u16,
src_mac: String,
kerberos_domain: String,
kerberos_hostname: String,
kerberos_username: String,
web_hostname: String,
dns_query: String,
dns_query_type: i64,
dns_reply_code: i64,
end_reason: String,
flow_risk: Vec<u32>,
frequency: i64,
frequency_id: String,
frequency_reason: String,
app_protocol_guess: u8,
http_content_type: String,
http_response_code: i16,
http_url: String,
http_user_agent: String,
model_confidence: f32,
model_dga_prediction: f32,
model_discard: u8,
model_label_index: u8,
model_dga: String,
model_epa: String,
username: String,
passwd: String,
payload_lengths: Vec<i64>,
protocol: String,
ssh_client_hassh: String,
ssh_server_hassh: String,
timeval_diffs: Vec<i64>,
tls_alpn: String,
tls_cipher: String,
tls_cipher_id: i64,
tls_cipher_level: String,
tls_ja3c: String,
tls_ja3s: String,
tls_not_after: i64,
tls_not_before: i64,
tls_cert_hash: String,
tls_version: String,
tls_version_id: i64,
tls_supported_versions: String,
pcr_payload: f32,
pcr_goodput: f32,
trusted_domain: u8,
unpopular_dest: u8,
vlan: u16
}
Today I got the probably same error:
2021-12-04 01:52:01 ERROR project: BadResponse("Code: 49. DB::Exception: Too large size (9223372036854775808) passed to allocator. It indicates an error.: While executing BinaryRowInputFormat. (LOGICAL_ERROR) (version 21.11.4.14 (official build))")
My struct:
pub struct Dn {
pub id: [u8; 32],
pub cid: [u8; 32],
pub did: [u8; 32],
pub dti: i8,
pub cti: i8,
pub t: String,
pub cn: String,
pub cs1: String,
pub cs2: String,
pub ptd: String,
pub fr: String,
pub bi: String,
pub lgts: String,
pub lgts_r: String,
pub lgts_d: String,
pub co: String,
pub vdt: String,
pub jdg: String,
pub apdx: String,
pub to: u32,
pub po: u32,
pub created_at: u32,
pub updated_at: u32,
pub deleted_at: u32,
}
Sorry, I figured out my issue is due to unmatched data type between migration and struct, change u32
to u16
solved.
Maybe related https://github.com/ClickHouse/ClickHouse/issues/37420
I face the similiar issue with 22.5.1.2079,when I try to insert I get Memory limit(forr query) exce 64.00 PiB (attempt to allocate chunk of 72057594039063687 bvtes), maximum: 279.40 GiB. But there is no way this insert can use 64pb(I also receive 4 EiB error sometime). Did you solve this issue?
I've been testing clickhouse as a potential db for work and was excited to see a rust clickhouse client. I wrote a small program that randomly generates data that matches our db schema, and then inserts tons of that data into the database with the intent of both getting to know clickhouse better and seeing if it meets out insert and query needs.
Running my test though, I'm coming up against memory errors that look like this issue on the clickhouse repo. I've been trying to troubleshoot it, but I'm just not familiar enough with clickhouse yet to nail down what's causing the issue and what exactly the issue is.
Here's my little test program
My table is very simple, no nested objects and the engine is just a
MergeTree
using atimestamp
value to order by.When I run this with batch sizes of <1,000 rows, I get this error
When I run this with a batch size of 10,000, I get this
Based on the information in the clickhouse issues that are similar to this, I think there's something going on with how the BinaryRowInputFormat queries are being executed, but being newer to clickhouse I'm not very confident that I'm correct about that. Today I hope to follow up by doing a similar test but instead of using this clickhouse client library, I'll just connect to the port and send raw http requests and see if I get the same issues or not. Similar clickhouse issues I've found
I'm on Ubuntu 20.04, 4 cpu cores, 10GB ram, 32GB disk. Looking at htop output while the program is running, I don't see much that helps aside from a lot of
clickhouse-server
threads (maybe about 50).For reference, I have no problem inserting a 4GB json file with
clickhouse-client --query "<insert statement>" < file.json
I'm happy to help if there are more questions about this issue.