Open apavlo opened 4 years ago
Just an update, I ran into this while testing YCSB with oltpbench for #739. The query is a select *
on the table, and eventually ends up writing garbage while returning the data. I've attached a Wireshark trace that corresponds to the screenshot.
As of 2020-08-27, this is still an issue:
terrier: /home/pavlo/Documents/Peloton/Github/terrier/src/include/network/packet_writer.h:82: terrier::network::PacketWriter& terrier::network::PacketWriter::AppendRaw(const void*, size_t): Assertion `(!IsPacketEmpty()) && ("packet length is null")' failed.
terrier: /home/pavlo/Documents/Peloton/Github/terrier/src/include/network/packet_writer.h:63: terrier::network::PacketWriter& terrier::network::PacketWriter::BeginPacket(terrier::network::NetworkMessageType): Assertion `(IsPacketEmpty()) && ("packet length is null")' failed.
Stack Trace
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007ffff52308b1 in __GI_abort () at abort.c:79
#2 0x00007ffff522042a in __assert_fail_base (fmt=0x7ffff53a7a38 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x555558f60900 "(!IsPacketEmpty()) && (\"packet length is null\")",
file=file@entry=0x555558f60820 "/home/pavlo/Documents/Peloton/Github/terrier/src/include/network/packet_writer.h", line=line@entry=82,
function=function@entry=0x555558f630c0 <terrier::network::PacketWriter::AppendRaw(void const*, unsigned long)::__PRETTY_FUNCTION__> "terrier::network::PacketWriter& terrier::network::PacketWriter::AppendRaw(const void*, size_t)") at assert.c:92
#3 0x00007ffff52204a2 in __GI___assert_fail (assertion=0x555558f60900 "(!IsPacketEmpty()) && (\"packet length is null\")",
file=0x555558f60820 "/home/pavlo/Documents/Peloton/Github/terrier/src/include/network/packet_writer.h", line=82,
function=0x555558f630c0 <terrier::network::PacketWriter::AppendRaw(void const*, unsigned long)::__PRETTY_FUNCTION__> "terrier::network::PacketWriter& terrier::network::PacketWriter::AppendRaw(const void*, size_t)")
at assert.c:101
#4 0x00005555561d8a95 in terrier::network::PacketWriter::AppendRaw (this=0x7fff6c4a5580, src=0x7ffce2f32964, len=4) at /home/pavlo/Documents/Peloton/Github/terrier/src/include/network/packet_writer.h:82
#5 0x00005555561dbc59 in terrier::network::PacketWriter::AppendRawValue<int> (this=0x7fff6c4a5580, val=-1425997824) at /home/pavlo/Documents/Peloton/Github/terrier/src/include/network/packet_writer.h:100
#6 0x00005555561d9ba0 in terrier::network::PacketWriter::AppendValue<int> (this=0x7fff6c4a5580, val=427) at /home/pavlo/Documents/Peloton/Github/terrier/src/include/network/packet_writer.h:128
#7 0x00005555561d0cc9 in terrier::network::PostgresPacketWriter::WriteTextAttribute (this=0x7fff6c4a5580, val=0x627000395290, type=terrier::type::TypeId::VARCHAR)
at /home/pavlo/Documents/Peloton/Github/terrier/src/network/postgres/postgres_packet_writer.cpp:375
#8 0x00005555561d0416 in terrier::network::PostgresPacketWriter::WriteDataRow (this=0x7fff6c4a5580, tuple=0x627000395100, columns=std::vector of length 21, capacity 32 = {...},
field_formats=std::vector of length 1, capacity 1 = {...}) at /home/pavlo/Documents/Peloton/Github/terrier/src/network/postgres/postgres_packet_writer.cpp:251
#9 0x0000555556a4fe05 in terrier::execution::exec::OutputWriter::operator() (this=0x603006eaf2b0, tuples=0x627000395100, num_tuples=32, tuple_size=424)
at /home/pavlo/Documents/Peloton/Github/terrier/src/execution/exec/output.cpp:98
#10 0x00005555565b520c in std::_Function_handler<void (std::byte*, unsigned int, unsigned int), terrier::execution::exec::OutputWriter>::_M_invoke(std::_Any_data const&, std::byte*&&, unsigned int&&, unsigned int&&) (
__functor=..., __args#0=@0x7ffce2f32d90: 0x627000395100, __args#1=@0x7ffce2f32d8c: 32, __args#2=@0x7ffce2f32d88: 424) at /usr/include/c++/7/bits/std_function.h:316
#11 0x0000555556a5051e in std::function<void (std::byte*, unsigned int, unsigned int)>::operator()(std::byte*, unsigned int, unsigned int) const (this=0x606000f39e18, __args#0=0x627000395100, __args#1=32, __args#2=424)
at /usr/include/c++/7/bits/std_function.h:706
#12 0x0000555556c12c3e in terrier::execution::exec::OutputBuffer::AllocOutputSlot (this=0x606000f39e00) at /home/pavlo/Documents/Peloton/Github/terrier/src/include/execution/exec/output.h:60
#13 0x0000555556c33914 in OpResultBufferAllocOutputRow (result=0x7ffce2f36458, ctx=0x610000337940) at /home/pavlo/Documents/Peloton/Github/terrier/src/include/execution/vm/bytecode_handlers.h:1275
#14 0x0000555556c0828b in terrier::execution::vm::VM::Interpret (this=0x7ffce2f36590, ip=0x61d000a64aec <incomplete sequence \333>, frame=0x7ffce2f365d0)
at /home/pavlo/Documents/Peloton/Github/terrier/src/execution/vm/vm.cpp:1611
#15 0x0000555556beb7ab in terrier::execution::vm::VM::InvokeFunction (module=0x60400523d0d0, func_id=3, args=0x7ffce2f36650 "\220\061\354\001 `") at /home/pavlo/Documents/Peloton/Github/terrier/src/execution/vm/vm.cpp:112
#16 0x00007ffff2c06032 in ?? ()
#17 0x0000602001ec3190 in ?? ()
#18 0x000060a00053af40 in ?? ()
#19 0x00007ffce2f366e0 in ?? ()
#20 0x0000555556ab326a in terrier::execution::sql::ThreadStateContainer::AccessCurrentThreadState (this=0xfff9c5e6f0c) at /home/pavlo/Documents/Peloton/Github/terrier/src/execution/sql/thread_state_container.cpp:87
#21 0x000055555792b5b5 in tbb::interface9::internal::start_for<tbb::blocked_range<unsigned int>, terrier::execution::sql::(anonymous namespace)::ScanTask, tbb::auto_partitioner const>::run_body (this=0x7ffff1037d40, r=...)
at /usr/include/tbb/parallel_for.h:102
#22 0x000055555792b164 in tbb::interface9::internal::balancing_partition_type<tbb::interface9::internal::adaptive_mode<tbb::interface9::internal::auto_partition_type> >::work_balance<tbb::interface9::internal::start_for<tbb::blocked_range<unsigned int>, terrier::execution::sql::(anonymous namespace)::ScanTask, tbb::auto_partitioner const>, tbb::blocked_range<unsigned int> > (this=0x7ffff1037d90, start=..., range=...)
at /usr/include/tbb/partitioner.h:429
#23 0x000055555792af75 in tbb::interface9::internal::partition_type_base<tbb::interface9::internal::auto_partition_type>::execute<tbb::interface9::internal::start_for<tbb::blocked_range<unsigned int>, terrier::execution::sql::(anonymous namespace)::ScanTask, tbb::auto_partitioner const>, tbb::blocked_range<unsigned int> > (this=0x7ffff1037d90, start=..., range=...) at /usr/include/tbb/partitioner.h:255
#24 0x000055555792ad2e in tbb::interface9::internal::start_for<tbb::blocked_range<unsigned int>, terrier::execution::sql::(anonymous namespace)::ScanTask, tbb::auto_partitioner const>::execute (this=0x7ffff1037d40)
at /usr/include/tbb/parallel_for.h:127
#25 0x00007ffff6591b46 in ?? () from /usr/lib/x86_64-linux-gnu/libtbb.so.2
#26 0x00007ffff658aaf8 in ?? () from /usr/lib/x86_64-linux-gnu/libtbb.so.2
#27 0x00007ffff65893db in ?? () from /usr/lib/x86_64-linux-gnu/libtbb.so.2
#28 0x00007ffff6585512 in ?? () from /usr/lib/x86_64-linux-gnu/libtbb.so.2
#29 0x00007ffff6585769 in ?? () from /usr/lib/x86_64-linux-gnu/libtbb.so.2
#30 0x00007ffff6c026db in start_thread (arg=0x7ffce2f38700) at pthread_create.c:463
#31 0x00007ffff5311a3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Not timestamp (I thought it might be related to handling of timestamps)
I think I'm hitting this while trying to run chbenchmark. Going to spend a couple hours tonight trying to understand it.
We are not able to return the results for queries that exceed a certain size. The server kills the connection. Sometimes there are client-side errors that mention reading invalid packets.
To reproduce, load the TPC-C database with scalefactor=1. This is will add 30k tuples to the CUSTOMER table:
Then try to read the entire table back:
The server reports:
Subsequent invocations produce different client-side errors:
Attempt 2
Attempt 3
Attempt 4 On the fourth attempt, the client just hangs for ever.