abs-tudelft / fletcher

Fletcher: A framework to integrate FPGA accelerators with Apache Arrow
https://abs-tudelft.github.io/fletcher/
Apache License 2.0
217 stars 31 forks source link

Stringread example throws std::bad_alloc #283

Closed yuqi-ali closed 1 year ago

yuqi-ali commented 2 years ago

stringread]# fletchgen -r names.rb -s memory.srec -l vhdl --sim [INFO ]: Loading RecordBatch(es) from names.rb terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc

johanpel commented 2 years ago

I'm not able to reproduce this.

Could you please run this in gdb and post a backtrace?

mbrobbel commented 2 years ago

The example seems to run in CI (on #284): https://github.com/abs-tudelft/fletcher/runs/5535140660?check_suite_focus=true#step:7:12

yuqi-ali commented 2 years ago

I'm not familiar with C++ . I test it in Centos7 And Arrow 7.0

yuqi-ali commented 2 years ago

Starting program: /usr/local/bin/fletchgen -r names.rb -s memory.srec -l vhdl --axi warning: File "/usr/lib64/libstdc++.so.6.0.27-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/usr/bin/mono-gdb.py". To enable execution of this file add add-auto-load-safe-path /usr/lib64/libstdc++.so.6.0.27-gdb.py line to your configuration file "/root/.gdbinit". To completely disable this security protection add set auto-load safe-path / line to your configuration file "/root/.gdbinit". For more information about this security protection see the "Auto-loading safe path" section in the GDB manual. E.g., run from the shell: info "(gdb)Auto-loading safe path" [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". [New Thread 0x7ffff19ff700 (LWP 520)] [INFO ]: Loading RecordBatch(es) from names.rb [New Thread 0x7ffff11fe700 (LWP 521)] [New Thread 0x7ffff09fd700 (LWP 522)] [New Thread 0x7fffebbff700 (LWP 524)] terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc

Program received signal SIGABRT, Aborted. 0x00007ffff5102387 in raise () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install arrow-libs-7.0.0-1.el7.x86_64 brotli-1.0.7-5.el7.x86_64 bzip2-libs-1.0.6-13.el7.x86_64 glibc-2.17-325.el7_9.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-51.el7_9.x86_64 libcom_err-1.42.9-19.el7.x86_64 libselinux-2.5-15.el7.x86_64 libzstd-1.5.2-1.el7.x86_64 lz4-1.8.3-1.el7.x86_64 openssl-libs-1.0.2k-24.el7_9.x86_64 pcre-8.32-17.el7.x86_64 snappy-1.1.0-3.el7.x86_64 zlib-1.2.7-19.el7_9.x86_64 (gdb) bt

0 0x00007ffff5102387 in raise () from /lib64/libc.so.6

1 0x00007ffff5103a78 in abort () from /lib64/libc.so.6

2 0x00007ffff5c69823 in __gnu_cxx::__verbose_terminate_handler () at ../../.././libstdc++-v3/libsupc++/vterminate.cc:95

3 0x00007ffff5c75446 in cxxabiv1::terminate(void (*)()) () at ../../.././libstdc++-v3/libsupc++/eh_terminate.cc:47

4 0x00007ffff5c75491 in std::terminate () at ../../.././libstdc++-v3/libsupc++/eh_terminate.cc:57

5 0x00007ffff5c756c4 in cxxabiv1::cxa_throw (obj=, tinfo=0x7ffff5f9afa0 ,

dest=0x7ffff5c73bd0 <std::bad_alloc::~bad_alloc()>) at ../../.././libstdc++-v3/libsupc++/eh_throw.cc:95

6 0x00007ffff5c694be in operator new (sz=18446744073709551608) at ../../.././libstdc++-v3/libsupc++/new_op.cc:54

7 0x00007ffff66f86b7 in void std::vector<std::string, std::allocator >::_M_emplace_back_aux(std::string&&) ()

from /lib64/libarrow.so.700

8 0x00007ffff683d242 in arrow::KeyValueMetadata::Append(std::string, std::string) () from /lib64/libarrow.so.700

9 0x00007ffff733f6e5 in arrow::ipc::internal::GetKeyValueMetadata(flatbuffers::Vector<flatbuffers::Offset > const, std::shared_ptr) () from /lib64/libarrow.so.700

10 0x00007ffff7346e09 in arrow::ipc::internal::(anonymous namespace)::FieldFromFlatbuffer(org::apache::arrow::flatbuf::Field const, arrow::ipc::internal::FieldPosition, arrow::ipc::DictionaryMemo, std::shared_ptr*) () from /lib64/libarrow.so.700

11 0x00007ffff7347c70 in arrow::ipc::internal::GetSchema(void const, arrow::ipc::DictionaryMemo, std::shared_ptr*) ()

from /lib64/libarrow.so.700

12 0x00007ffff735615c in arrow::ipc::UnpackSchemaMessage(void const, arrow::ipc::IpcReadOptions const&, arrow::ipc::DictionaryMemo, std::shared_ptr, std::shared_ptr, std::vector<bool, std::allocator >, bool) () from /lib64/libarrow.so.700

13 0x00007ffff736bf39 in arrow::ipc::RecordBatchFileReaderImpl::Open(arrow::io::RandomAccessFile*, long, arrow::ipc::IpcReadOptions const&)

() from /lib64/libarrow.so.700

14 0x00007ffff7357230 in arrow::ipc::RecordBatchFileReader::Open(std::shared_ptr const&, long, arrow::ipc::IpcReadOptions const&) () from /lib64/libarrow.so.700

15 0x00007ffff7357454 in arrow::ipc::RecordBatchFileReader::Open(std::shared_ptr const&, arrow::ipc::IpcReadOptions const&) () from /lib64/libarrow.so.700

16 0x000000000057ed6c in fletcher::ReadRecordBatchesFromFile(std::string const&, std::vector<std::shared_ptr, std::allocator<std::shared_ptr > >*) ()

17 0x0000000000452b1b in fletchgen::Options::LoadRecordBatches (this=0xa059c0)

at /root/fletcher_Gen/fletcher/codegen/cpp/fletchgen/src/fletchgen/options.cc:162

18 0x00000000004c85b3 in fletchgen::fletchgen (argc=8, argv=0x7fffffffe0f8)

at /root/fletcher_Gen/fletcher/codegen/cpp/fletchgen/src/fletchgen/fletchgen.cc:63

19 0x0000000000413792 in main (argc=8, argv=0x7fffffffe0f8) at /root/fletcher_Gen/fletcher/codegen/cpp/fletchgen/src/fletchgen/main.cc:18

(gdb)

johanpel commented 2 years ago

Could you try to use the names.rb file from this branch and see if that fixes the issue? https://github.com/abs-tudelft/fletcher/tree/bad_alloc Thanks!

yuqi-ali commented 2 years ago

It does not work: [INFO ]: Loading RecordBatch(es) from names.rb terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc

yuqi-ali commented 2 years ago

My Gcc Version is 8.3.1

johanpel commented 2 years ago

From the backtrace I see that somewhere deep down in the Arrow code its trying to allocate a very large array

0x00007ffff5c694be in operator new (sz=18446744073709551608) at ../../.././libstdc++-v3/libsupc++/new_op.cc:54

This leads me to believe that the recordbatch file is somehow corrupt.

Does this problem persist when supplying Fletchgen with other recordbatches as well?

yuqi-ali commented 2 years ago

I test the Hobbits.rb file get the same error: fletchgen -r Hobbits.rb -s memory.srec -l vhdl --axi [INFO ]: Loading RecordBatch(es) from Hobbits.rb terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc

johanpel commented 2 years ago

If you add this line to common/cpp/src/fletcher/arrow-utils.cc:244

  std::cout << file->Read(file->GetSize().ValueOrDie()).ValueOrDie()->ToHexString() << std::endl;

What is being printed?

yuqi-ali commented 2 years ago

It prints nothing

johanpel commented 2 years ago

That is weird. If the file were empty, I would expect the following error:

[ERROR]: Could not open RecordBatchFileReader. ARROW:[Invalid: File is too small: 0]

Just to be sure, did you recompile after adding the line?

yuqi-ali commented 2 years ago

It seems the error arises in the "arrow::ipc::RecordBatchFileReader::Open(file)" before it returns file_result

johanpel commented 2 years ago

Sorry, I had the wrong line number there. Can you plug it in on line 236, just after:

  std::shared_ptr<arrow::io::ReadableFile> file = result.ValueOrDie();
yuqi-ali commented 2 years ago

The Print Result is

johanpel commented 2 years ago

Alright, thanks. The file looks to be loaded properly there...

Could you please describe how you've built and/or installed Arrow?

Thanks.

yuqi-ali commented 2 years ago

I test it in CentOS7, I install Arrow in this way: sudo yum install -y epel-release || sudo yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-$(cut -d: -f5 /etc/system-release-cpe | cut -d. -f1).noarch.rpm sudo yum install -y https://apache.jfrog.io/artifactory/arrow/centos/$(cut -d: -f5 /etc/system-release-cpe | cut -d. -f1)/apache-arrow-release-latest.rpm sudo yum install -y --enablerepo=epel arrow-devel # For C++ sudo yum install -y --enablerepo=epel arrow-glib-devel # For GLib (C) sudo yum install -y --enablerepo=epel arrow-dataset-devel # For Apache Arrow Dataset C++ sudo yum install -y --enablerepo=epel parquet-devel # For Apache Parquet C++ sudo yum install -y --enablerepo=epel parquet-glib-devel # For Apache Parquet GLib (C)

As the https://arrow.apache.org/install/ described

yaoye-ali commented 2 years ago

I also find the same question.

johanpel commented 2 years ago

I'm afraid that I can't do much more than this without being able to reproduce the issue myself.

If I were to be able to reproduce this, I would go down the backtrace in a debugger and try to verify that all variables involved in loading the file have the right values.

If that's not something you can do, perhaps it would be possible to set up a Docker image mimicking your environment, and see if you can reproduce it there? If that's the case, you can pass me the image and I can take a look.