Conda package for omnisci and omnisci-tools

dharhas commented 5 years ago

Finish conda packaging workflow. Goal is to get to 2 conda packages that allow for the following - 'conda install omnisci' to build/install the open source core, and 'conda install

omnisci-tools' to install the ibis, jupyterlab-omnisci, altair tools. With this, a user should be able to run omnisci locally and analyze data using jupyterlab + omnisci + ibis + altair

pearu commented 5 years ago

Status of omnisci-core (former mapd-core) conda package on OSX:

[x] conda package build
[x] Find a solution for three failing omnisci sanity tests. These tests fail with segfault. To debug, gdb is required, see next item.
[x] Waiting for Omnisci people to apply https://sourceware.org/gdb/wiki/PermissionsDarwin to their Mac Mini machine. [hit a bug in gdb osx program, cannot use gdb].
[x] Try using lldb for debugging.

Status of omnisci-tools:

[x] Add vdom to conda-forge
- conda recipes don't support pypi packages so we need to create a conda recipe for vdom on conda-forge
- https://github.com/conda-forge/staged-recipes/pull/8622/files
[x] Add omnisci-pytools to conda-forge: This is ready to submit but depends on vdom on conda-forge
- https://github.com/conda-forge/staged-recipes/pull/8623/files

pearu commented 5 years ago

tensorflow conda UX as an example of GPU enabled conda packaging: https://towardsdatascience.com/tensorflow-gpu-installation-made-easy-use-conda-instead-of-pip-52e5249374bc

pearu commented 5 years ago

Notes for omnisci-core build in conda environment with GPU enabled:

using conda -c conda-forge install cxx-compiler c-compiler [also Ubuntu gcc/g++ work]
must use llvmdev>=7.0.1 clangdev>=7 as earlier versions of llvmdev do not have NVPTX target support enabled
must use gcc compiler as llvmdev is built with gcc and there exists ABI incompatibility between llvmdev built with gcc and clang.
must use #define ALWAYS_INLINE in Shared/funcannotations.h to avoid gcc failure [WRONG: the actual cause was that conda env added -fPIC to CXXFLAGS that resulted Execute.cpp compilation failure]
must have NVIDIA drivers installed from nvidia.com, including libcuda.so and libnvidia-...so, these are located in /usr/lib/x86_64-linux-gnu. NVIDIA drivers provided by Ubuntu must be disabled. TODO: create a conda package for NVIDIA drivers [Issues: contains kernel module, legalities]

use the following build commands:

export CXXFLAGS="$CXXFLAGS -DBOOST_ERROR_CODE_HEADER_ONLY -Wfatal-errors"
# make sure CXXFLAGS does not contain `-fPIC`
export LDFLAGS="-L$PREFIX/lib -Wl,-rpath,$PREFIX/lib -L/usr/lib/x86_64-linux-gnu/ -Wl,-rpath,/usr/lib/x86_64-linux-gnu/ -lrt -pthread -lresolv -v /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.418.56
cmake -DCMAKE_INSTALL_PREFIX=$PREFIX -DCMAKE_BUILD_TYPE=debug -DENABLE_AWS_S3=off -DENABLE_FOLLY=off -DENABLE_JAVA_REMOTE_DEBUG=off -DMAPD_IMMERSE_DOWNLOAD=off -DMAPD_DOCS_DOWNLOAD=off -DPREFER_STATIC_LIBS=off -DENABLE_CUDA=on -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ -DENABLE_PROFILER=off -DMAPD_EDITION=EE  ..
CGO_ENABLED=1 CC=clang CGO_LDFLAGS= CGO_CFLAGS= CGO_CPPFLAGS=  make -j 20
mkdir tmp
bin/initdb
make sanity_tests

Current status:

[x] Build
[x] Sanity tests

[x] Fix Loadtime UDF failure:

$ bin/mapd_server  --udf=../../sample_udf.cpp 
error: cannot find libdevice for sm_30. Provide path to different CUDA installation via --cuda-path, or pass -nocudalib to build without linking with libdevice.

[SOLUTION: install cuda 9.2 to /usr/local/cuda-9.2]

pearu commented 5 years ago

mac osx conda build status update:

build is successful

three tests are failing:


82% tests passed, 3 tests failed out of 17

Label Time Summary: sanity = 148.59 sec*proc (17 tests)

Total Test time (real) = 148.65 sec

The following tests FAILED: 2 - UpdelStorageTest (Failed) 6 - ExecuteTest (Failed) 15 - TopKTest (Failed)

- debugging with lldb:

sudo lldb Tests/ExecuteTest (lldb) run ...... [----------] 115 tests from Select [ RUN ] Select.NullGroupBy Process 11588 stopped

thread #2, stop reason = EXC_BAD_ACCESS (code=1, address=0x0) frame #0: 0x0000000000000000 error: memory read failed for 0x0 Target 0: (ExecuteTest) stopped. (lldb) bt
thread #2, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
- frame #0: 0x0000000000000000 (lldb) bt all thread #1, queue = 'com.apple.main-thread' frame #0: 0x00007fff7214786a libsystem_kernel.dylib__psynch_cvwait + 10 frame #1: 0x00007fff7220056e libsystem_pthread.dylib_pthread_cond_wait + 722 frame #2: 0x0000000105fbfa32 libc++.1.dylibstd::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) + 18 frame #3: 0x0000000105fc2fdb libc++.1.dylibstd::1::assoc_sub_state::wait() + 75 frame #4: 0x000000010043e768 ExecuteTestExecutor::dispatchFragments(std::__1::function<void (ExecutorDeviceType, int, QueryCompilationDescriptor const&, QueryMemoryDescriptor const&, std::__1::vector<FragmentsPerTable, std::__1::allocator<FragmentsPerTable> > const&, unsigned long, long long)>, Executor::ExecutionDispatch const&, std::__1::vector<InputTableInfo, std::__1::allocator<InputTableInfo> > const&, ExecutionOptions const&, bool, bool, unsigned long, QueryCompilationDescriptor const&, QueryMemoryDescriptor const&, QueryFragmentDescriptor&, std::__1::unordered_set<int, std::__1::hash<int>, std::__1::equal_to<int>, std::__1::allocator<int> >&, int&) + 4472 frame #5: 0x000000010043babc ExecuteTestExecutor::executeWorkUnitImpl(int, unsigned long&, bool, bool, std::1::vector<InputTableInfo, std::1::allocator > const&, RelAlgExecutionUnit const&, CompilationOptions const&, ExecutionOptions const&, Catalog_Namespace::Catalog const&, std::__1::shared_ptr, RenderInfo, bool) + 2364 frame #6: 0x000000010043a794 ExecuteTestExecutor::executeWorkUnit(int*, unsigned long&, bool, std::__1::vector<InputTableInfo, std::__1::allocator<InputTableInfo> > const&, RelAlgExecutionUnit const&, CompilationOptions const&, ExecutionOptions const&, Catalog_Namespace::Catalog const&, std::__1::shared_ptr<RowSetMemoryOwner>, RenderInfo*, bool) + 180 frame #7: 0x0000000100586bd0 ExecuteTestRelAlgExecutor::executeWorkUnit(RelAlgExecutor::WorkUnit const&, std::1::vector<TargetMetaInfo, std::1::allocator > const&, bool, CompilationOptions const&, ExecutionOptions const&, RenderInfo, long long, long) + 2400 frame #8: 0x000000010057b849 ExecuteTest`RelAlgExecutor::executeCompound(RelCompound const, CompilationOptions const&, ExecutionOptions const&, RenderInfo, long long) + 281 frame #9: 0x000000010057a227 ExecuteTest`RelAlgExecutor::executeRelAlgStep(unsigned long, std::1::vector<RaExecutionDesc, std::1::allocator >&, CompilationOptions const&, ExecutionOptions const&, RenderInfo, long long) + 823 frame #10: 0x0000000100577ca2 ExecuteTestRelAlgExecutor::executeRelAlgSeq(std::__1::vector<RaExecutionDesc, std::__1::allocator<RaExecutionDesc> >&, CompilationOptions const&, ExecutionOptions const&, RenderInfo*, long long) + 1106 frame #11: 0x0000000100575e7a ExecuteTestRelAlgExecutor::executeRelAlgQueryNoRetry(std::1::basic_string<char, std::__1::char_traits, std::1::allocator > const&, CompilationOptions const&, ExecutionOptions const&, RenderInfo) + 3274 frame #12: 0x0000000100575047 ExecuteTest`RelAlgExecutor::executeRelAlgQuery(std::1::basic_string<char, std::__1::char_traits, std::1::allocator > const&, CompilationOptions const&, ExecutionOptions const&, RenderInfo) + 215 frame #13: 0x00000001002651d5 ExecuteTestQueryRunner::run_select_query(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::unique_ptr<Catalog_Namespace::SessionInfo, std::__1::default_delete<Catalog_Namespace::SessionInfo> > const&, ExecutorDeviceType, bool, bool, bool) + 1141 frame #14: 0x0000000100266a02 ExecuteTestQueryRunner::run_multiple_agg(std::1::basic_string<char, std::__1::char_traits, std::1::allocator > const&, std::1::unique_ptr<Catalog_Namespace::SessionInfo, std::1::default_delete > const&, ExecutorDeviceType, bool, bool, std::1::unique_ptr<QueryRunner::IROutputFile, std::__1::default_delete > const&) + 3090 frame #15: 0x000000010000ed53 ExecuteTest`(anonymous namespace)::run_simple_agg(std::1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, ExecutorDeviceType, bool, bool) + 67 frame #16: 0x0000000100011304 ExecuteTestSelect_NullGroupBy_Test::TestBody() + 1540 frame #17: 0x000000010023b208 ExecuteTestvoid testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test, void (testing::Test::)(), char const) + 72 frame #18: 0x000000010023b0ce ExecuteTesttesting::Test::Run() + 334 frame #19: 0x000000010023c480 ExecuteTesttesting::TestInfo::Run() + 304 frame #20: 0x000000010023cde7 ExecuteTesttesting::TestCase::Run() + 311 frame #21: 0x00000001002460e7 ExecuteTesttesting::internal::UnitTestImpl::RunAllTests() + 1207 frame #22: 0x0000000100245a88 ExecuteTest`bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl, bool (testing::internal::UnitTestImpl::)(), char const) + 72 frame #23: 0x00000001002459fc ExecuteTesttesting::UnitTest::Run() + 172 frame #24: 0x00000001001df249 ExecuteTestmain + 4025 frame #25: 0x00007fff7200f3d5 libdyld.dylib`start + 1
thread #2, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
- frame #0: 0x0000000000000000

xmnlab commented 5 years ago

@pearu .. not sure if it could help .. but maybe the conf cpu-buffer-mem-bytes = 1000000000 could help here: https://github.com/ibis-project/ibis/blob/master/ci/omnisci.conf#L3

@andrewseidl do you think it could help? or do you have any other thoughts?

pearu commented 5 years ago

@xmnlab note that the error is obtained in mac mini box.

andrewseidl commented 5 years ago

Is this failing with CUDA enabled or disabled?

@xmnlab shouldn't need to set cpu-buffer-mem-bytes, by default it will be allowed to use up to 80% of the system mem.

@pearu could you try running with the GLOG_logtostderr=1 env var set? Just to get a bit more debug info.

pearu commented 5 years ago

@andrewseidl This is CUDA disabled build. With GLOG_logtostderr=1:

[ RUN      ] Select.NullGroupBy
I0528 10:16:02.426483 400070080 Calcite.cpp:300] Time to updateMetadata 0 (ms)
...
I0528 10:16:02.427465 400070080 Catalog.cpp:1234] Instantiating Fragmenter for table table_null_group_by took 0ms
I0528 10:16:02.432482 400070080 Catalog.cpp:1303] Time to load Dictionary 1_96 was 4ms
I0528 10:16:02.962232 400070080 Calcite.cpp:432] User mapd catalog mapd sql 'SELECT val FROM table_null_group_by GROUP BY val;'
I0528 10:16:02.970645 400070080 Calcite.cpp:452] Time in Thrift 1 (ms), Time in Java Calcite server 7 (ms)
...
I0528 10:16:02.982969 400070080 Calcite.cpp:300] Time to updateMetadata 1 (ms)
...
I0528 10:16:02.987901 400070080 Calcite.cpp:300] Time to updateMetadata 0 (ms)
...
I0528 10:16:02.988775 400070080 Catalog.cpp:1234] Instantiating Fragmenter for table table_null_group_by took 0ms
I0528 10:16:03.517801 400070080 Calcite.cpp:432] User mapd catalog mapd sql 'SELECT val FROM table_null_group_by GROUP BY val;'
I0528 10:16:03.524819 400070080 Calcite.cpp:452] Time in Thrift 1 (ms), Time in Java Calcite server 5 (ms)
2019-05-28 10:16:03 ERROR TThreadPoolServer:run:315 - Thrift error occurred during processing of message.
org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection reset
        at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
        at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:425)
        at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:321)
        at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:225)
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27)
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:310)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:210)
        at java.net.SocketInputStream.read(SocketInputStream.java:141)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
        at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
        ... 9 more
E0528 10:16:03.556411 30892032 QueryRunner.cpp:113] Interrupt signal (11) received.
I0528 10:16:03.556426 30892032 Calcite.cpp:512] Shutting down Calcite server
I0528 10:16:03.560261 30892032 Calcite.cpp:521] shut down Calcite
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0528 10:16:03.567929 30892032 Calcite.cpp:527] Destroy Calcite Class
I0528 10:16:03.567950 30892032 Calcite.cpp:529] End of Calcite Destructor 
<end of output>

where ... are typical messages with ERROR TThreadPoolServer:run:315 - Thrift error occurred during processing of message.

pearu commented 5 years ago

The crash happens when calling

child.wait();

in Executor::dispatchFragments. Before calling wait, child.valid() returns true.

pearu commented 5 years ago

@andrewseidl , I have tracked the crash down to QueryExecutionContext::launchCpuCode where literal_buff[0] is evaluated while literal_buff.size() is 0.

pearu commented 5 years ago

Interestingly, in Linux (using gcc) literal_buff[0] does not cause a crash even when literal_buff.size() == 0.

pearu commented 5 years ago

@andrewseidl , I think we have hit a compiler-dependent issue described in https://stackoverflow.com/questions/3829788/using-operator-on-empty-stdvector

pearu commented 5 years ago

Defining

#define GETVECTORPTR(A) ((A.size()>0) ? &A[0] : nullptr)

and wrapping all &A[0] usages with GETVECTORPTR fixes some of test crashes in Mac OSX.

pearu commented 5 years ago

The issue of conda packaging omnscidb-cpu for Mac OSX with test failures has a solution: https://github.com/conda-forge/omniscidb-cpu-feedstock/pull/7

pearu commented 5 years ago

On the following build failure:

/home/pearu/miniconda3/envs/omnisci-gpu-dev/include/llvm/Object/SymbolicFile.h:48:31: error: expected ')' before 'PRIxPTR'

use

export CXXFLAGS="$CXXFLAGS -D__STDC_FORMAT_MACROS=1"

as a fix.

gnestor commented 5 years ago

omnisci-pytools is available on conda-forge: https://github.com/conda-forge/omnisci-pytools-feedstock

conda install omnisci-pytools will install jupyterlab-omnisci and jupyter-widgets JupyterLab extensions and install all dependencies.

pearu commented 5 years ago

Moved the remaining task to a separate issue (#63 ).

Quansight / omnisci

Conda package for omnisci and omnisci-tools #25