Closed dharhas closed 5 years ago
Status of omnisci-core (former mapd-core) conda package on OSX:
lldb
for debugging.Status of omnisci-tools:
tensorflow conda UX as an example of GPU enabled conda packaging: https://towardsdatascience.com/tensorflow-gpu-installation-made-easy-use-conda-instead-of-pip-52e5249374bc
Notes for omnisci-core build in conda environment with GPU enabled:
conda -c conda-forge install cxx-compiler c-compiler
[also Ubuntu gcc/g++ work]#define ALWAYS_INLINE
in Shared/funcannotations.h
to avoid gcc failure [WRONG: the actual cause was that conda env added -fPIC
to CXXFLAGS
that resulted Execute.cpp
compilation failure]libcuda.so
and libnvidia-...so
, these are located in /usr/lib/x86_64-linux-gnu
. NVIDIA drivers provided by Ubuntu must be disabled. TODO: create a conda package for NVIDIA drivers [Issues: contains kernel module, legalities]export CXXFLAGS="$CXXFLAGS -DBOOST_ERROR_CODE_HEADER_ONLY -Wfatal-errors"
# make sure CXXFLAGS does not contain `-fPIC`
export LDFLAGS="-L$PREFIX/lib -Wl,-rpath,$PREFIX/lib -L/usr/lib/x86_64-linux-gnu/ -Wl,-rpath,/usr/lib/x86_64-linux-gnu/ -lrt -pthread -lresolv -v /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.418.56
cmake -DCMAKE_INSTALL_PREFIX=$PREFIX -DCMAKE_BUILD_TYPE=debug -DENABLE_AWS_S3=off -DENABLE_FOLLY=off -DENABLE_JAVA_REMOTE_DEBUG=off -DMAPD_IMMERSE_DOWNLOAD=off -DMAPD_DOCS_DOWNLOAD=off -DPREFER_STATIC_LIBS=off -DENABLE_CUDA=on -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ -DENABLE_PROFILER=off -DMAPD_EDITION=EE ..
CGO_ENABLED=1 CC=clang CGO_LDFLAGS= CGO_CFLAGS= CGO_CPPFLAGS= make -j 20
mkdir tmp
bin/initdb
make sanity_tests
Current status:
$ bin/mapd_server --udf=../../sample_udf.cpp
error: cannot find libdevice for sm_30. Provide path to different CUDA installation via --cuda-path, or pass -nocudalib to build without linking with libdevice.
[SOLUTION: install cuda 9.2 to /usr/local/cuda-9.2]
mac osx conda build status update:
82% tests passed, 3 tests failed out of 17
Label Time Summary: sanity = 148.59 sec*proc (17 tests)
Total Test time (real) = 148.65 sec
The following tests FAILED: 2 - UpdelStorageTest (Failed) 6 - ExecuteTest (Failed) 15 - TopKTest (Failed)
- debugging with lldb:
sudo lldb Tests/ExecuteTest (lldb) run ...... [----------] 115 tests from Select [ RUN ] Select.NullGroupBy Process 11588 stopped
__psynch_cvwait + 10 frame #1: 0x00007fff7220056e libsystem_pthread.dylib
_pthread_cond_wait + 722
frame #2: 0x0000000105fbfa32 libc++.1.dylibstd::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) + 18 frame #3: 0x0000000105fc2fdb libc++.1.dylib
std::1::assoc_sub_state::wait() + 75
frame #4: 0x000000010043e768 ExecuteTestExecutor::dispatchFragments(std::__1::function<void (ExecutorDeviceType, int, QueryCompilationDescriptor const&, QueryMemoryDescriptor const&, std::__1::vector<FragmentsPerTable, std::__1::allocator<FragmentsPerTable> > const&, unsigned long, long long)>, Executor::ExecutionDispatch const&, std::__1::vector<InputTableInfo, std::__1::allocator<InputTableInfo> > const&, ExecutionOptions const&, bool, bool, unsigned long, QueryCompilationDescriptor const&, QueryMemoryDescriptor const&, QueryFragmentDescriptor&, std::__1::unordered_set<int, std::__1::hash<int>, std::__1::equal_to<int>, std::__1::allocator<int> >&, int&) + 4472 frame #5: 0x000000010043babc ExecuteTest
Executor::executeWorkUnitImpl(int, unsigned long&, bool, bool, std::1::vector<InputTableInfo, std::1::allocatorExecutor::executeWorkUnit(int*, unsigned long&, bool, std::__1::vector<InputTableInfo, std::__1::allocator<InputTableInfo> > const&, RelAlgExecutionUnit const&, CompilationOptions const&, ExecutionOptions const&, Catalog_Namespace::Catalog const&, std::__1::shared_ptr<RowSetMemoryOwner>, RenderInfo*, bool) + 180 frame #7: 0x0000000100586bd0 ExecuteTest
RelAlgExecutor::executeWorkUnit(RelAlgExecutor::WorkUnit const&, std::1::vector<TargetMetaInfo, std::1::allocatorRelAlgExecutor::executeRelAlgSeq(std::__1::vector<RaExecutionDesc, std::__1::allocator<RaExecutionDesc> >&, CompilationOptions const&, ExecutionOptions const&, RenderInfo*, long long) + 1106 frame #11: 0x0000000100575e7a ExecuteTest
RelAlgExecutor::executeRelAlgQueryNoRetry(std::1::basic_string<char, std::__1::char_traitsQueryRunner::run_select_query(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::unique_ptr<Catalog_Namespace::SessionInfo, std::__1::default_delete<Catalog_Namespace::SessionInfo> > const&, ExecutorDeviceType, bool, bool, bool) + 1141 frame #14: 0x0000000100266a02 ExecuteTest
QueryRunner::run_multiple_agg(std::1::basic_string<char, std::__1::char_traitsSelect_NullGroupBy_Test::TestBody() + 1540 frame #17: 0x000000010023b208 ExecuteTest
void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test, void (testing::Test::)(), char const) + 72
frame #18: 0x000000010023b0ce ExecuteTesttesting::Test::Run() + 334 frame #19: 0x000000010023c480 ExecuteTest
testing::TestInfo::Run() + 304
frame #20: 0x000000010023cde7 ExecuteTesttesting::TestCase::Run() + 311 frame #21: 0x00000001002460e7 ExecuteTest
testing::internal::UnitTestImpl::RunAllTests() + 1207
frame #22: 0x0000000100245a88 ExecuteTest`bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl, bool (testing::internal::UnitTestImpl::)(), char const) + 72
frame #23: 0x00000001002459fc ExecuteTesttesting::UnitTest::Run() + 172 frame #24: 0x00000001001df249 ExecuteTest
main + 4025
frame #25: 0x00007fff7200f3d5 libdyld.dylib`start + 1
@pearu .. not sure if it could help .. but maybe the conf cpu-buffer-mem-bytes = 1000000000
could help here: https://github.com/ibis-project/ibis/blob/master/ci/omnisci.conf#L3
@andrewseidl do you think it could help? or do you have any other thoughts?
@xmnlab note that the error is obtained in mac mini box.
Is this failing with CUDA enabled or disabled?
@xmnlab shouldn't need to set cpu-buffer-mem-bytes
, by default it will be allowed to use up to 80% of the system mem.
@pearu could you try running with the GLOG_logtostderr=1
env var set? Just to get a bit more debug info.
@andrewseidl This is CUDA disabled build.
With GLOG_logtostderr=1
:
[ RUN ] Select.NullGroupBy
I0528 10:16:02.426483 400070080 Calcite.cpp:300] Time to updateMetadata 0 (ms)
...
I0528 10:16:02.427465 400070080 Catalog.cpp:1234] Instantiating Fragmenter for table table_null_group_by took 0ms
I0528 10:16:02.432482 400070080 Catalog.cpp:1303] Time to load Dictionary 1_96 was 4ms
I0528 10:16:02.962232 400070080 Calcite.cpp:432] User mapd catalog mapd sql 'SELECT val FROM table_null_group_by GROUP BY val;'
I0528 10:16:02.970645 400070080 Calcite.cpp:452] Time in Thrift 1 (ms), Time in Java Calcite server 7 (ms)
...
I0528 10:16:02.982969 400070080 Calcite.cpp:300] Time to updateMetadata 1 (ms)
...
I0528 10:16:02.987901 400070080 Calcite.cpp:300] Time to updateMetadata 0 (ms)
...
I0528 10:16:02.988775 400070080 Catalog.cpp:1234] Instantiating Fragmenter for table table_null_group_by took 0ms
I0528 10:16:03.517801 400070080 Calcite.cpp:432] User mapd catalog mapd sql 'SELECT val FROM table_null_group_by GROUP BY val;'
I0528 10:16:03.524819 400070080 Calcite.cpp:452] Time in Thrift 1 (ms), Time in Java Calcite server 5 (ms)
2019-05-28 10:16:03 ERROR TThreadPoolServer:run:315 - Thrift error occurred during processing of message.
org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection reset
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:425)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:321)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:225)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:310)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:210)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
... 9 more
E0528 10:16:03.556411 30892032 QueryRunner.cpp:113] Interrupt signal (11) received.
I0528 10:16:03.556426 30892032 Calcite.cpp:512] Shutting down Calcite server
I0528 10:16:03.560261 30892032 Calcite.cpp:521] shut down Calcite
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0528 10:16:03.567929 30892032 Calcite.cpp:527] Destroy Calcite Class
I0528 10:16:03.567950 30892032 Calcite.cpp:529] End of Calcite Destructor
<end of output>
where ...
are typical messages with ERROR TThreadPoolServer:run:315 - Thrift error occurred during processing of message.
The crash happens when calling
child.wait();
in Executor::dispatchFragments
. Before calling wait
, child.valid()
returns true
.
@andrewseidl , I have tracked the crash down to QueryExecutionContext::launchCpuCode
where literal_buff[0]
is evaluated while literal_buff.size()
is 0
.
Interestingly, in Linux (using gcc) literal_buff[0]
does not cause a crash even when literal_buff.size() == 0
.
@andrewseidl , I think we have hit a compiler-dependent issue described in https://stackoverflow.com/questions/3829788/using-operator-on-empty-stdvector
Defining
#define GETVECTORPTR(A) ((A.size()>0) ? &A[0] : nullptr)
and wrapping all &A[0]
usages with GETVECTORPTR
fixes some of test crashes in Mac OSX.
The issue of conda packaging omnscidb-cpu for Mac OSX with test failures has a solution: https://github.com/conda-forge/omniscidb-cpu-feedstock/pull/7
On the following build failure:
/home/pearu/miniconda3/envs/omnisci-gpu-dev/include/llvm/Object/SymbolicFile.h:48:31: error: expected ')' before 'PRIxPTR'
use
export CXXFLAGS="$CXXFLAGS -D__STDC_FORMAT_MACROS=1"
as a fix.
omnisci-pytools is available on conda-forge: https://github.com/conda-forge/omnisci-pytools-feedstock
conda install omnisci-pytools
will install jupyterlab-omnisci and jupyter-widgets JupyterLab extensions and install all dependencies.
Moved the remaining task to a separate issue (#63 ).
Finish conda packaging workflow. Goal is to get to 2 conda packages that allow for the following - 'conda install omnisci' to build/install the open source core, and 'conda install
omnisci-tools' to install the ibis, jupyterlab-omnisci, altair tools. With this, a user should be able to run omnisci locally and analyze data using jupyterlab + omnisci + ibis + altair