apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
14.2k stars 3.46k forks source link

[C++] Error when building Arrow-CPP w/ Azure Filesystem for Conda-Forge #41990

Open srilman opened 3 months ago

srilman commented 3 months ago

Describe the bug, including details regarding any error messages, version, and platform.

In this PR (https://github.com/conda-forge/arrow-cpp-feedstock/pull/1431), I attempted to modify the Conda recipe to add the necessary dependencies for packaging the Azure filesystem in the arrow-cpp packages on conda-forge. I was able to successfully build the package for Linux and MacOS, but ran into an error when compiling on MSVC:

FAILED: src/arrow/CMakeFiles/arrow_filesystem_shared.dir/filesystem/filesystem.cc.obj 
C:\PROGRA~1\MICROS~2\2022\ENTERP~1\VC\Tools\MSVC\1429~1.301\bin\HostX64\x64\cl.exe  /nologo /TP -DABSL_CONSUME_DLL -DARROW_EXPORTING -DARROW_HAVE_RUNTIME_AVX2 -DARROW_HAVE_RUNTIME_AVX512 -DARROW_HAVE_RUNTIME_BMI2 -DARROW_HAVE_RUNTIME_SSE4_2 -DARROW_WITH_TIMING_TESTS -DAWS_AUTH_USE_IMPORT_EXPORT -DAWS_CAL_USE_IMPORT_EXPORT -DAWS_CHECKSUMS_USE_IMPORT_EXPORT -DAWS_COMMON_USE_IMPORT_EXPORT -DAWS_COMPRESSION_USE_IMPORT_EXPORT -DAWS_CRT_CPP_USE_IMPORT_EXPORT -DAWS_EVENT_STREAM_USE_IMPORT_EXPORT -DAWS_HTTP_USE_IMPORT_EXPORT -DAWS_IO_USE_IMPORT_EXPORT -DAWS_MQTT_USE_IMPORT_EXPORT -DAWS_S3_USE_IMPORT_EXPORT -DAWS_SDKUTILS_USE_IMPORT_EXPORT -DAWS_SDK_VERSION_MAJOR=1 -DAWS_SDK_VERSION_MINOR=11 -DAWS_SDK_VERSION_PATCH=329 -DAWS_USE_IO_COMPLETION_PORTS -DAZ_CORE_DLL -DAZ_IDENTITY_DLL -DAZ_RTTI -DAZ_STORAGE_BLOBS_DLL -DAZ_STORAGE_COMMON_DLL -DAZ_STORAGE_FILES_DATALAKE_DLL -DPROTOBUF_USE_DLLS -DUSE_IMPORT_EXPORT -DUSE_IMPORT_EXPORT=1 -DUSE_WINDOWS_DLL_SEMANTICS -D_CRT_SECURE_NO_WARNINGS -D_ENABLE_EXTENDED_ALIGNED_STORAGE -I%SRC_DIR%\cpp\build\src -I%SRC_DIR%\cpp\src -I%SRC_DIR%\cpp\src\generated -external:I%PREFIX%\Library\include -external:I%SRC_DIR%\cpp\thirdparty\hadoop\include -external:W0 /DWIN32 /D_WINDOWS /GR /EHsc /D_SILENCE_TR1_NAMESPACE_DEPRECATION_WARNING  /EHsc /wd5105 /bigobj /utf-8 /W3 /wd4800 /wd4996 /wd4065  /O2 /Ob2 /DNDEBUG -std:c++17 -MD /showIncludes /Fosrc\arrow\CMakeFiles\arrow_filesystem_shared.dir\filesystem\filesystem.cc.obj /Fdsrc\arrow\CMakeFiles\arrow_filesystem_shared.dir\ /FS -c %SRC_DIR%\cpp\src\arrow\filesystem\filesystem.cc
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.29.30133\include\memory(3089): error C2027: use of undefined type 'arrow::fs::AzureFileSystem::Impl'
%SRC_DIR%\cpp\src\arrow/filesystem/azurefs.h(232): note: see declaration of 'arrow::fs::AzureFileSystem::Impl'
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.29.30133\include\memory(3088): note: while compiling class template member function 'void std::default_delete<arrow::fs::AzureFileSystem::Impl>::operator ()(_Ty *) noexcept const'
        with
        [
            _Ty=arrow::fs::AzureFileSystem::Impl
        ]
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.29.30133\include\memory(3198): note: see reference to function template instantiation 'void std::default_delete<arrow::fs::AzureFileSystem::Impl>::operator ()(_Ty *) noexcept const' being compiled
        with
        [
            _Ty=arrow::fs::AzureFileSystem::Impl
        ]
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.29.30133\include\memory(3125): note: see reference to class template instantiation 'std::default_delete<arrow::fs::AzureFileSystem::Impl>' being compiled
%SRC_DIR%\cpp\src\arrow/filesystem/azurefs.h(233): note: see reference to class template instantiation 'std::unique_ptr<arrow::fs::AzureFileSystem::Impl,std::default_delete<arrow::fs::AzureFileSystem::Impl>>' being compiled
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.29.30133\include\memory(3089): error C2338: can't delete an incomplete type
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.29.30133\include\memory(3090): warning C4150: deletion of pointer to incomplete type 'arrow::fs::AzureFileSystem::Impl'; no destructor called
%SRC_DIR%\cpp\src\arrow/filesystem/azurefs.h(232): note: see declaration of 'arrow::fs::AzureFileSystem::Impl'

I'm not familiar with Windows / MSVC so need some help debugging.

Component(s)

C++

kou commented 2 months ago

Ah, using std::unique_ptr<Impl>&& in azurefs.h may be a problem.

Could you try this?

diff --git a/cpp/src/arrow/filesystem/azurefs.cc b/cpp/src/arrow/filesystem/azurefs.cc
index bb8309a247..b4975a3098 100644
--- a/cpp/src/arrow/filesystem/azurefs.cc
+++ b/cpp/src/arrow/filesystem/azurefs.cc
@@ -1398,19 +1398,14 @@ class AzureFileSystem::Impl {
   std::unique_ptr<Blobs::BlobServiceClient> blob_service_client_;
   HNSSupport cached_hns_support_ = HNSSupport::kUnknown;

+ public:
   Impl(AzureOptions options, io::IOContext io_context)
       : io_context_(std::move(io_context)), options_(std::move(options)) {}

- public:
-  static Result<std::unique_ptr<AzureFileSystem::Impl>> Make(AzureOptions options,
-                                                             io::IOContext io_context) {
-    auto self = std::unique_ptr<AzureFileSystem::Impl>(
-        new AzureFileSystem::Impl(std::move(options), std::move(io_context)));
-    ARROW_ASSIGN_OR_RAISE(self->blob_service_client_,
-                          self->options_.MakeBlobServiceClient());
-    ARROW_ASSIGN_OR_RAISE(self->datalake_service_client_,
-                          self->options_.MakeDataLakeServiceClient());
-    return self;
+  Status Init() {
+    ARROW_ASSIGN_OR_RAISE(blob_service_client_, options_.MakeBlobServiceClient());
+    ARROW_ASSIGN_OR_RAISE(datalake_service_client_, options_.MakeDataLakeServiceClient());
+    return Status::OK();
   }

   io::IOContext& io_context() { return io_context_; }
@@ -2893,19 +2888,23 @@ class AzureFileSystem::Impl {
 std::atomic<LeaseGuard::SteadyClock::time_point> LeaseGuard::latest_known_expiry_time_ =
     SteadyClock::time_point{SteadyClock::duration::zero()};

-AzureFileSystem::AzureFileSystem(std::unique_ptr<Impl>&& impl)
-    : FileSystem(impl->io_context()), impl_(std::move(impl)) {
+AzureFileSystem::AzureFileSystem(const AzureOptions& options,
+                                 const io::IOContext& io_context)
+    : FileSystem(io_context), impl_(std::make_unique<Impl>(options, io_context)) {
   default_async_is_sync_ = false;
 }

+Status AzureFileSystem::Init() { return impl_->Init(); }
+
 void AzureFileSystem::ForceCachedHierarchicalNamespaceSupport(int hns_support) {
   impl_->ForceCachedHierarchicalNamespaceSupport(hns_support);
 }

 Result<std::shared_ptr<AzureFileSystem>> AzureFileSystem::Make(
     const AzureOptions& options, const io::IOContext& io_context) {
-  ARROW_ASSIGN_OR_RAISE(auto impl, AzureFileSystem::Impl::Make(options, io_context));
-  return std::shared_ptr<AzureFileSystem>(new AzureFileSystem(std::move(impl)));
+  std::shared_ptr<AzureFileSystem> filesystem(new AzureFileSystem(options, io_context));
+  ARROW_RETURN_NOT_OK(filesystem->Init());
+  return filesystem;
 }

 const AzureOptions& AzureFileSystem::options() const { return impl_->options(); }
diff --git a/cpp/src/arrow/filesystem/azurefs.h b/cpp/src/arrow/filesystem/azurefs.h
index 350014954f..2f6ecb53a1 100644
--- a/cpp/src/arrow/filesystem/azurefs.h
+++ b/cpp/src/arrow/filesystem/azurefs.h
@@ -232,7 +232,8 @@ class ARROW_EXPORT AzureFileSystem : public FileSystem {
   class Impl;
   std::unique_ptr<Impl> impl_;

-  explicit AzureFileSystem(std::unique_ptr<Impl>&& impl);
+  explicit AzureFileSystem(const AzureOptions& options, const io::IOContext& io_context);
+  Status Init();

   friend class TestAzureFileSystem;
   void ForceCachedHierarchicalNamespaceSupport(int hns_support);
h-vetinari commented 2 months ago

Thanks for the patch! It leads to:

[51/182] Building CXX object src\arrow\CMakeFiles\arrow_compute_shared.dir\compute\kernels\aggregate_basic_avx512.cc.obj
[52/182] Building CXX object src\arrow\CMakeFiles\arrow_filesystem_shared.dir\Unity\unity_0_cxx.cxx.obj
FAILED: src/arrow/CMakeFiles/arrow_filesystem_shared.dir/Unity/unity_0_cxx.cxx.obj 
C:\PROGRA~1\MICROS~2\2022\ENTERP~1\VC\Tools\MSVC\1429~1.301\bin\HostX64\x64\cl.exe  /nologo /TP -DABSL_CONSUME_DLL -DARROW_EXPORTING -DARROW_HAVE_RUNTIME_AVX2 -DARROW_HAVE_RUNTIME_AVX512 -DARROW_HAVE_RUNTIME_BMI2 -DARROW_HAVE_RUNTIME_SSE4_2 -DARROW_WITH_TIMING_TESTS -DAWS_AUTH_USE_IMPORT_EXPORT -DAWS_CAL_USE_IMPORT_EXPORT -DAWS_CHECKSUMS_USE_IMPORT_EXPORT -DAWS_COMMON_USE_IMPORT_EXPORT -DAWS_COMPRESSION_USE_IMPORT_EXPORT -DAWS_CRT_CPP_USE_IMPORT_EXPORT -DAWS_EVENT_STREAM_USE_IMPORT_EXPORT -DAWS_HTTP_USE_IMPORT_EXPORT -DAWS_IO_USE_IMPORT_EXPORT -DAWS_MQTT_USE_IMPORT_EXPORT -DAWS_S3_USE_IMPORT_EXPORT -DAWS_SDKUTILS_USE_IMPORT_EXPORT -DAWS_SDK_VERSION_MAJOR=1 -DAWS_SDK_VERSION_MINOR=11 -DAWS_SDK_VERSION_PATCH=329 -DAWS_USE_IO_COMPLETION_PORTS -DAZ_CORE_DLL -DAZ_IDENTITY_DLL -DAZ_RTTI -DAZ_STORAGE_BLOBS_DLL -DAZ_STORAGE_COMMON_DLL -DAZ_STORAGE_FILES_DATALAKE_DLL -DPROTOBUF_USE_DLLS -DUSE_IMPORT_EXPORT -DUSE_IMPORT_EXPORT=1 -DUSE_WINDOWS_DLL_SEMANTICS -D_CRT_SECURE_NO_WARNINGS -D_ENABLE_EXTENDED_ALIGNED_STORAGE -I%SRC_DIR%\cpp\build\src -I%SRC_DIR%\cpp\src -I%SRC_DIR%\cpp\src\generated -external:I%PREFIX%\Library\include -external:I%SRC_DIR%\cpp\thirdparty\hadoop\include -external:W0 /DWIN32 /D_WINDOWS /GR /EHsc /D_SILENCE_TR1_NAMESPACE_DEPRECATION_WARNING  /EHsc /wd5105 /bigobj /utf-8 /W3 /wd4800 /wd4996 /wd4065  /O2 /Ob2 /DNDEBUG -std:c++17 -MD /showIncludes /Fosrc\arrow\CMakeFiles\arrow_filesystem_shared.dir\Unity\unity_0_cxx.cxx.obj /Fdsrc\arrow\CMakeFiles\arrow_filesystem_shared.dir\ /FS -c %SRC_DIR%\cpp\build\src\arrow\CMakeFiles\arrow_filesystem_shared.dir\Unity\unity_0_cxx.cxx
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.29.30133\include\memory(3089): error C2027: use of undefined type 'arrow::fs::AzureFileSystem::Impl'
%SRC_DIR%\cpp\src\arrow/filesystem/azurefs.h(232): note: see declaration of 'arrow::fs::AzureFileSystem::Impl'
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.29.30133\include\memory(3088): note: while compiling class template member function 'void std::default_delete<arrow::fs::AzureFileSystem::Impl>::operator ()(_Ty *) noexcept const'
        with
        [
            _Ty=arrow::fs::AzureFileSystem::Impl
        ]
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.29.30133\include\memory(3198): note: see reference to function template instantiation 'void std::default_delete<arrow::fs::AzureFileSystem::Impl>::operator ()(_Ty *) noexcept const' being compiled
        with
        [
            _Ty=arrow::fs::AzureFileSystem::Impl
        ]
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.29.30133\include\memory(3125): note: see reference to class template instantiation 'std::default_delete<arrow::fs::AzureFileSystem::Impl>' being compiled
%SRC_DIR%\cpp\src\arrow/filesystem/azurefs.h(233): note: see reference to class template instantiation 'std::unique_ptr<arrow::fs::AzureFileSystem::Impl,std::default_delete<arrow::fs::AzureFileSystem::Impl>>' being compiled
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.29.30133\include\memory(3089): error C2338: can't delete an incomplete type
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.29.30133\include\memory(3090): warning C4150: deletion of pointer to incomplete type 'arrow::fs::AzureFileSystem::Impl'; no destructor called
%SRC_DIR%\cpp\src\arrow/filesystem/azurefs.h(232): note: see declaration of 'arrow::fs::AzureFileSystem::Impl'
[53/182] Building CXX object src\arrow\CMakeFiles\arrow_compute_shared.dir\compute\kernels\aggregate_basic_avx2.cc.obj
[54/182] Building CXX object src\arrow\CMakeFiles\arrow_filesystem_shared.dir\filesystem\azurefs.cc.obj
%SRC_DIR%\cpp\src\arrow\filesystem\azurefs.cc(1093): warning C4101: 'exception': unreferenced local variable
%SRC_DIR%\cpp\src\arrow\filesystem\azurefs.cc(2188): warning C4101: 'exception': unreferenced local variable

I'll try switching off unity builds next...

h-vetinari commented 2 months ago

Without unity build:

[208/702] Building CXX object src\arrow\CMakeFiles\arrow_filesystem_shared.dir\filesystem\filesystem.cc.obj
FAILED: src/arrow/CMakeFiles/arrow_filesystem_shared.dir/filesystem/filesystem.cc.obj 
C:\PROGRA~1\MICROS~2\2022\ENTERP~1\VC\Tools\MSVC\1429~1.301\bin\HostX64\x64\cl.exe  /nologo /TP -DABSL_CONSUME_DLL -DARROW_EXPORTING -DARROW_HAVE_RUNTIME_AVX2 -DARROW_HAVE_RUNTIME_AVX512 -DARROW_HAVE_RUNTIME_BMI2 -DARROW_HAVE_RUNTIME_SSE4_2 -DARROW_WITH_TIMING_TESTS -DAWS_AUTH_USE_IMPORT_EXPORT -DAWS_CAL_USE_IMPORT_EXPORT -DAWS_CHECKSUMS_USE_IMPORT_EXPORT -DAWS_COMMON_USE_IMPORT_EXPORT -DAWS_COMPRESSION_USE_IMPORT_EXPORT -DAWS_CRT_CPP_USE_IMPORT_EXPORT -DAWS_EVENT_STREAM_USE_IMPORT_EXPORT -DAWS_HTTP_USE_IMPORT_EXPORT -DAWS_IO_USE_IMPORT_EXPORT -DAWS_MQTT_USE_IMPORT_EXPORT -DAWS_S3_USE_IMPORT_EXPORT -DAWS_SDKUTILS_USE_IMPORT_EXPORT -DAWS_SDK_VERSION_MAJOR=1 -DAWS_SDK_VERSION_MINOR=11 -DAWS_SDK_VERSION_PATCH=329 -DAWS_USE_IO_COMPLETION_PORTS -DAZ_CORE_DLL -DAZ_IDENTITY_DLL -DAZ_RTTI -DAZ_STORAGE_BLOBS_DLL -DAZ_STORAGE_COMMON_DLL -DAZ_STORAGE_FILES_DATALAKE_DLL -DPROTOBUF_USE_DLLS -DUSE_IMPORT_EXPORT -DUSE_IMPORT_EXPORT=1 -DUSE_WINDOWS_DLL_SEMANTICS -D_CRT_SECURE_NO_WARNINGS -D_ENABLE_EXTENDED_ALIGNED_STORAGE -I%SRC_DIR%\cpp\build\src -I%SRC_DIR%\cpp\src -I%SRC_DIR%\cpp\src\generated -external:I%PREFIX%\Library\include -external:I%SRC_DIR%\cpp\thirdparty\hadoop\include -external:W0 /DWIN32 /D_WINDOWS /GR /EHsc /D_SILENCE_TR1_NAMESPACE_DEPRECATION_WARNING  /EHsc /wd5105 /bigobj /utf-8 /W3 /wd4800 /wd4996 /wd4065  /O2 /Ob2 /DNDEBUG -std:c++17 -MD /showIncludes /Fosrc\arrow\CMakeFiles\arrow_filesystem_shared.dir\filesystem\filesystem.cc.obj /Fdsrc\arrow\CMakeFiles\arrow_filesystem_shared.dir\ /FS -c %SRC_DIR%\cpp\src\arrow\filesystem\filesystem.cc
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.29.30133\include\memory(3089): error C2027: use of undefined type 'arrow::fs::AzureFileSystem::Impl'
%SRC_DIR%\cpp\src\arrow/filesystem/azurefs.h(232): note: see declaration of 'arrow::fs::AzureFileSystem::Impl'
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.29.30133\include\memory(3088): note: while compiling class template member function 'void std::default_delete<arrow::fs::AzureFileSystem::Impl>::operator ()(_Ty *) noexcept const'
        with
        [
            _Ty=arrow::fs::AzureFileSystem::Impl
        ]
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.29.30133\include\memory(3198): note: see reference to function template instantiation 'void std::default_delete<arrow::fs::AzureFileSystem::Impl>::operator ()(_Ty *) noexcept const' being compiled
        with
        [
            _Ty=arrow::fs::AzureFileSystem::Impl
        ]
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.29.30133\include\memory(3125): note: see reference to class template instantiation 'std::default_delete<arrow::fs::AzureFileSystem::Impl>' being compiled
%SRC_DIR%\cpp\src\arrow/filesystem/azurefs.h(233): note: see reference to class template instantiation 'std::unique_ptr<arrow::fs::AzureFileSystem::Impl,std::default_delete<arrow::fs::AzureFileSystem::Impl>>' being compiled
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.29.30133\include\memory(3089): error C2338: can't delete an incomplete type
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.29.30133\include\memory(3090): warning C4150: deletion of pointer to incomplete type 'arrow::fs::AzureFileSystem::Impl'; no destructor called
%SRC_DIR%\cpp\src\arrow/filesystem/azurefs.h(232): note: see declaration of 'arrow::fs::AzureFileSystem::Impl'
h-vetinari commented 2 months ago

I've started a new attempt at fixing this on windows. If you have any other ideas for how to fix this, I'd be very happy to try an updated patch.

kou commented 2 months ago

Thanks. It still failed...

It seems that we need to break down this problem. For example, we can enable Azure support on our Windows CI job to detect whether this is a conda specific problem or not.

h-vetinari commented 1 month ago

Tested with v17, failure persists.