Open rockeet opened 6 years ago
When I wrote this I missed write_buffer_size=20g, but even after trying that the results are similar. Otherwise I used write_buffer_size=64m.
What are you using for MyRocks and InnoDB?
I can't reproduce this using:
Given that this was a CPU bottleneck in ::read_field I don't think my.cnf tuning should matter.
I tried with 1M, 2M, 4M and 8M rows and MyRocks was faster in every case.
seconds to load
MyRocks InnoDB rows MyRocks datadir size
4.216 6.059 1M < 1gb
8.557 12.127 2M 1.6gb
17.282 24.692 4M 2.1gb
35.218 49.355 8M 5.3gb
The git hashes for MyRocks are:
git_hash 96de6c4308d1ca9884c01391355e09457089e2b7
rocksdb_git_hash 31ee4bf2400cf1807718192d352ca8bd7837ada2
The different should be you use sysbench schema which value length is larger than key, in our case key length is larger.
Maybe, but could you answer my questions about the versions of MyRocks and InnoDB used?
Even better would be a script to generate the data to load. That will save me time.
Would also help to know how many rows were in that 4GB of data.
While I appreciate bug reports, please provide sufficient detail when you first report a bug.
With 10M rows and something that matches your schema, 34 secs for MyRocks, 53 for InnoDB. With 100M rows, ~390 secs for MyRocks, ~570 secs for InnoDB. fsync on commit is disabled for both and MyRocks is tuned similar to the advice from the MyRocks github wiki.
Script to generate data was... for i in $( seq 1 X ) ; do printf "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa%d\t%d\t%d\t%d\t%d\n" $i $i $i $i $i done
Using perf, top-N functions for MyRocks ...
25.90% my-oneconnectio mysqld [.] rocksdb::WriteBatchEntryComparator::operator()
14.13% my-oneconnectio libc-2.23.so [.] __memcmp_sse4_1
9.99% my-oneconnectio mysqld [.] READ_INFO::read_field
6.18% my-oneconnectio mysqld [.] rocksdb::WriteBatchWithIndex::Rep::UpdateExistingEntryWithCfId
6.05% my-oneconnectio mysqld [.] myrocks::Rdb_pk_comparator::Compare
3.43% my-oneconnectio mysqld [.] mysql_load
2.02% my-oneconnectio mysqld [.] rocksdb::SkipList<rocksdb::WriteBatchIndexEntry*, rocksdb::WriteBatchEntryComparator const&>::Insert
1.41% my-oneconnectio mysqld [.] my_strntoull10rnd_8bit
1.31% my-oneconnectio mysqld [.] pack_row
1.21% my-oneconnectio mysqld [.] memcmp@plt
And for InnoDB
9.99% mysqld mysqld [.] cmp_dtuple_rec_with_match_low
8.44% mysqld mysqld [.] rec_get_offsets_func
5.47% mysqld mysqld [.] page_cur_search_with_match
5.38% mysqld mysqld [.] READ_INFO::read_field
4.76% mysqld mysqld [.] buf_page_get_gen
3.98% mysqld mysqld [.] log_group_write_buf
3.92% mysqld mysqld [.] os_aio_linux_handle
2.65% mysqld mysqld [.] btr_cur_search_to_nth_level
2.57% mysqld mysqld [.] page_cur_insert_rec_write_log
2.44% mysqld libc-2.23.so [.] __memset_avx2
Repeated the test with write_buffer_size set to 64mb and 20gb, results were similar, about 390 secs. Then repeated test with a default my.cnf for MyRocks and results were still about 390 secs.
Very thanks for your effort, our test has several difference with yours:
There still are some other details I can not figure out now, since the engineer for this test is in vacation, I will paste the result as soon.
Thank you again!
I asked twice above about the versions of MyRocks and InnoDB that you have been using. You have yet to answer so I will going to stop helping on this bug. Next time please be less vague when filing a bug report. I ended up asking a lot of questions. Try to provide enough detail so that someone can reproduce the problem that you see -- which includes details on the versions of code used, the my.cnf options, some way to simulate the data, etc.
wal sync on or off is fine as long as InnoDB has similar setting. Although with commit-in-middle you might have the opportunity to do more commits with MyRocks than InnoDB. I rarely use commit-in-middle but suggest setting it to something larger than 1000. I didn't use commit-in-middle for my results above.
With rocksdb_commit_in_the_middle MyRocks is even faster than above -- about 260 seconds with rocksdb_bulk_load_size=1000 and then 250 seconds with rocksdb_bulk_load_size=10000. Again, wal sync was off.
Sorry our detailed test info is late.
We have re-run the test and collected the result as below:
When we test with rocksdb_commit_in_the_middle=off
, we hit an outdated bug about wrong using of vector::reserve, we hotfixed the bug on this version, the result is based on the fixed code.
my.conf
for innodb
[mysqld]
character-set-server = utf8
collation-server = utf8_bin
performance_schema = off
default-storage-engine = INNODB
innodb_buffer_pool_size = 16G
innodb_buffer_pool_instances = 1
innodb_flush_log_at_trx_commit=1 # this conf is set by demand
# dir configs...
my.conf
for rocksdb
[mysqld]
character-set-server = utf8
collation-server = utf8_bin
rocksdb
default-storage-engine=rocksdb
skip-innodb
default-tmp-storage-engine = MyISAM
rocksdb_default_cf_options = write_buffer_size=20g
rocksdb_bulk_load_size = 1000
rocksdb_commit_in_the_middle = ON # this conf is set by demand
rocksdb_flush_log_at_trx_commit = 1 # this conf is set by demand
# dir configs...
data-gen.py
#!/usr/bin/python3
import random
import time
ap = 10000
t = time.time()
def get_str(dic):
global t
global ap
s = str(int(t)) + str(ap) + '_sh' + dic[random.randint(0, 999)]
delimiter = ', '
g = lambda r : '0' if r < 0.3 \
else '1' if r < 0.6 \
else '2' if r < 0.95 \
else '120' if r < 0.98 \
else '130'
for i in range(4):
r= random.random()
s += delimiter + g(r)
ap += 1
return s
def generate(n):
dic = {}
for i in range(0, 1000):
dic[i] = str(random.randint(10000, 99999))
for i in range(1, n):
print get_str(dic)
if __name__ == "__main__":
generate(70000000)
We wrote data-gen.py
to generate simulate data: 70,000,000 items with total 2.37 GB.
innodb
result is copyed in each table for compare with rocksdb
:
rocksdb_commit_in_the_middle = on
and rocksdb_bulk_load_size = 1000
fsync | innodb | rocksdb |
---|---|---|
on | 381.5 sec | 419.3 sec |
off | 383.7 sec | 185.8 sec |
rocksdb_commit_in_the_middle = off
fsync | innodb | rocksdb |
---|---|---|
on | 381.5 sec | 431.4 sec |
off | 383.7 sec | 426.6 sec |
9.63% my-oneconnectio mysqld [.] READ_INFO::read_field
8.74% my-oneconnectio libc-2.17.so [.] __memcmp_sse4_1
7.61% my-oneconnectio mysqld [.] rocksdb::SkipList<rocksdb::WriteBatchIndexEntry*, rocksdb::WriteBatchEntryComparator<true> const&>::FindGreaterOrEqual
6.33% my-oneconnectio mysqld [.] myrocks::Rdb_pk_comparator::Compare
4.20% rocksdb:bg0 mysqld [.] snappy::internal::CompressFragment
3.34% my-oneconnectio mysqld [.] my_strnxfrm_unicode
3.02% my-oneconnectio mysqld [.] rocksdb::MemTable::KeyComparator::operator()
2.77% my-oneconnectio mysqld [.] rocksdb::InlineSkipList<rocksdb::MemTableRep::KeyComparator const&>::Insert<false>
2.38% my-oneconnectio mysqld [.] rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Slow_CRC32>
2.22% my-oneconnectio mysqld [.] rocksdb::InternalKeyComparator::Compare
2.09% my-oneconnectio mysqld [.] mysql_load
1.93% my-oneconnectio mysqld [.] myrocks::Rdb_key_def::pack_with_varchar_space_pad
10.19% mysqld mysqld [.] rec_get_offsets_func
8.28% mysqld mysqld [.] my_strnncollsp_mb_bin
6.90% mysqld mysqld [.] buf_page_get_gen
6.73% mysqld mysqld [.] page_cur_search_with_match
6.22% mysqld mysqld [.] cmp_dtuple_rec_with_match_low
5.59% mysqld mysqld [.] READ_INFO::read_field
4.71% mysqld mysqld [.] log_group_write_buf
4.62% mysqld mysqld [.] buf_calc_page_new_checksum
2.32% mysqld mysqld [.] innobase_mysql_cmp
2.31% mysqld mysqld [.] cmp_whole_field
rocksdb_commit_in_the_middle = on
can enhance performance.Conclusions would be more useful if you told us what version you use for InnoDB. This is the 4th time I asked for it on this bug report. 1) What provides InnoDB -- upstream (Oracle), Percona Server, MariaDB? 2) What version is InnoDB from 5.6.?, 5.7.? or 8.0.?
You are using MyRocks from Dec, 2017 which uses RocksDB from git hash 8486ea. That is old code and I can't reproduce it on current MyRocks.
It would also be good to know whether this is a custom fork of MyRocks, with many changes to it or MySQL that nobody else has access to.
mysqld
as MyRocks with InnoDB's my.conf
.Code in the test is fb-prod201704(26adcc9) with one line change on rocksdb
:
diff --git a/util/arena.cc b/util/arena.cc
index 1382cb1..52fb411 100644
--- a/util/arena.cc
+++ b/util/arena.cc
@@ -197,7 +197,6 @@ char* Arena::AllocateNewBlock(size_t block_bytes) {
// this way the insertion into the vector below will not throw and we
// won't leak the allocated memory in that case. if reserve() throws,
// we won't leak either
- blocks_.reserve(blocks_.size() + 1);
char* block = new char[block_bytes];
size_t allocated_size;
We still have a fork of MyRocks with many changes, but that is not the one in this test.
This bug report started with "The loading speed is greatly slower than InnoDB." Then after much back and forth you provide results where MyRocks gets about 90% of the throughput compared to InnoDB. Do you mean that 10% slower is "greatly slower"?
Next time be specific up front when you file a bug report because 10% slower isn't "greatly slower" to most people.
Note that this is an in-memory workload.
Repeated tests with fsync off, tuned my.cnf for InnoDB, default my.cnf for MyRocks using both latin1_bin (that I used for the results above) and utf8/utf8_bin (that you mention above). MyRocks is faster for both. The binlog was enabled for the first two tests, and for the results I shared above. Disabling it makes things slightly faster. When enabled, fsync was disabled for it.
I am also using upstream (unmodified) MySQL 5.6.35 for InnoDB. While you get it from the FB branch. From past tests InnoDB 5.6 from the FB branch is faster than from upstream 5.6.
I also used performance_schema=1 for InnoDB, but perf schema is disabled for MyRocks. I don't need it with the FB branch because we have user_statistics & table_statistics. I do need perf schema with upstream MySQL to get something similar to user/table stats.
MyRocks InnoDB
407 572 latin1_bin
469 663 utf8/utf8_bin
442 615 utf8/utf8_bin, skip_log_bin
NA 605 utf8/utf8_bin, skip_log_bin, performance_schema=0
MySQL script:
write_buffer_size
was set to20G
to disable db flush & compactions.Then we load 4GB data into MySQL.
In
perf-top
,READ_INFO::read_field()
is the top function, it consumed8.79%
CPU time. The loading speed is greatly slower than InnoDB.