facebook / rocksdb

A library that provides an embeddable, persistent key-value store for fast storage.
http://rocksdb.org
GNU General Public License v2.0
28.52k stars 6.31k forks source link

Latest changes in AlignedBuffer produce Access Violation #4051

Closed yuslepukhin closed 6 years ago

yuslepukhin commented 6 years ago

Note: Please use Issues only for bug reports. For questions, discussions, feature requests, etc. post to dev group: https://www.facebook.com/groups/rocksdb.dev

Expected behavior

db_bench does not core dump

Actual behavior

db_bench when running seekrandom produces access violation (jemalloc build)

In the below code the current size and the capacity of the buffer is 12288, however, the memcpy attempts to copy bufstart_ + copy_offset (12288), copy_len(12288) from the old buffer resulting in out of buffer access.

image

image

image

Steps to reproduce the behavior

Then issue seekrandom scenario (adjust as approp) --benchmarks=seekrandom --mmap_read=0 --statistics=1 --histogram=1 --num=10000000 --threads=32 --prefix_size=0 --keys_per_prefix=40 --value_size=800 --block_size=4096 --cache_size=1048576 --bloom_bits=10 --memtable_bloom_size_ratio=0 --cache_numshardbits=6 --open_files=500000 --verify_checksum=1 --db=d:\mnt\RandomSeekNext_10M --sync=0 --disable_wal=1 --compression_type=snappy --seek_nexts=40 --stats_interval=1000000 --compression_ratio=0.5 --write_buffer_size=67108864 --target_file_size_base=67108864 --max_write_buffer_number=3 --max_background_compactions=20 --max_background_flushes=4 --enable_pipelined_write=false --allow_concurrent_memtable_write=false --enable_write_thread_adaptive_yield=true --level0_file_num_compaction_trigger=4 --level0_slowdown_writes_trigger=8 --level0_stop_writes_trigger=12 --num_levels=6 --delete_obsolete_files_period_micros=300000000 --min_level_to_compress=2 --max_compaction_bytes=0 --stats_per_interval=1 --max_bytes_for_level_base=10485760 --use_existing_db=1 --use_direct_reads=1 --use_direct_io_for_flush_and_compaction=1 --duration=600 --compaction_readahead_size=3145728 --writable_file_max_buffer_size=1048576 --new_table_reader_for_compaction_inputs=1

sagar0 commented 6 years ago

Taking a look ...

yuslepukhin commented 6 years ago

I think AlignedBuffer is not used properly in a couple of places. The notion of the class encapsulation is that it knows its size and would restrict operations based on what it knows. The example includes Read() method. It would never read more than the buffer has and will return the actual amount. The recent changes violate that principle.

Another example of improper use is at https://github.com/facebook/rocksdb/blob/master/util/file_reader_writer.h#L214

There FilePrefetchBuffer has a member bufferlen which is a duplicate of what AlignedBuffer has. The code updates FilePrefertchBuffer member but not the actual buffer. They never agree on each other because when the data is written directly into the AlignedBuffer the buffer is never told its length by calling Size(size_t) method. The end result is that the AlignedBuffer does not know its proper length and can not operate properly.

sagar0 commented 6 years ago

I haven't been able to reproduce the crash/access_violation on Linux, running on even a much large db and larger number of ops. I have also been running a few stress tests over the past few days without an issue. Adding more tests and tweaks to see if I can repro this, and also trying to setup a windows environment.

yuslepukhin commented 6 years ago

@sagar0 I have a 100% repro case. I suggest that you simply run under the debugger and put a conditional breakpoint (or an assert) for the condition outlined in the above scenario. I am lucky to repro because the beyond the buffer access falls beyond the mapped memory region on windows but in a different setup the memory can be mapped and thus no immediate impact.

yuslepukhin commented 6 years ago

@sagar0 The main issue is that the buffer access is driven from the outside and the internal knowledge of the buffer is neither consulted nor enforced, thus anyone can supply arbitrary offset and data size to copy.

yuslepukhin commented 6 years ago

@sagar0 I think this is exactly the problem

sagar0 commented 6 years ago

@yuslepukhin I investigated more into the issue yesterday and am currently working on a fix.

yuslepukhin commented 6 years ago

I am going to PR my proposal.

sagar0 commented 6 years ago

Created #4105 to remove external tracking of AlignedBuffer size. Do you think we'd need more than that?

yuslepukhin commented 6 years ago

@sagar0 I think, what needs to be done is the actual size to be copied to be enforced by std::min/max functions based on the current data size and buffer capacity. And then return the actual size copied. This way, the caller can assert() to see if his expectations are met.

yuslepukhin commented 6 years ago

@sagar0 Also there are many spaces in encryption code (!) where the buffer size is not set. Again, once copying is enforced, it is going to break

sagar0 commented 6 years ago

I missed making changes in the env_encryption code as I was solely concentrating on file_reader_writer APIs. I'll update env_encryption code as well.

yuslepukhin commented 6 years ago

@sagar0 So providing tests pass and encryption is addressed, the only thing is to return the number of bytes copied from Allocate() so the caller could verify it meets the expections.

yuslepukhin commented 6 years ago

@sagar0 With regards to encryption code. It a lot of times simply memmove data into the buffer. One can simply call Append() and it would take care of the size and everything else.

sagar0 commented 6 years ago

There are multiple different problems we found while investigating this issue. Let me summarize our discussion based on my understanding so far:

  1. Access Violation which @yuslepukhin saw in seekrandom with db_bench in his setup: This is a bug introduced in #3884, and the fix for this is in #4100.
  2. AlignedBuffer length is being tracked outside, possibly incorrectly and redundantly, in ReadadheadRandomAccessFile and FilePrefetchBuffer. -- Fix for this is in #4105.
  3. AlignedBuffer size is not updated in env_encryption.cc, after reading into it. -- Still needs a fix. I am looking into this.
  4. AlignedBuffer::AllocateNewBuffer() should return num bytes copied. -- Still needs a fix.
  5. Tests for AlignedBuffer and FilePrefetchBuffer -- good to have. (This week is our tech-debt week. And I plan on adding a few of these).
yuslepukhin commented 6 years ago

@sagar0 This is spot on! Thank you for attending to this.