apache / arrow-nanoarrow

Helpers for Arrow C Data & Arrow C Stream interfaces
https://arrow.apache.org/nanoarrow
Apache License 2.0
169 stars 35 forks source link

Run clang-tidy in CI or as an extended check #537

Open paleolimbot opened 3 months ago

paleolimbot commented 3 months ago

The clang-tidy job in ADBC higlighted a few warnings that should be addressed ( https://github.com/apache/arrow-adbc/pull/1928 ). We should run this in our CI to ensure that nanoarrow doesn't cause problems for downstream projects that use clang-tidy (like ADBC!).

clang-tidy --use-color -export-fixes /tmp/tmp_73j2310/tmpw7bhm7r6.yaml -extra-arg=-Wno-unknown-warning-option -p=/home/runner/work/arrow-adbc/arrow-adbc/build/tidy -quiet /home/runner/work/arrow-adbc/arrow-adbc/c/vendor/nanoarrow/nanoarrow.c
/home/runner/work/arrow-adbc/arrow-adbc/c/vendor/nanoarrow/nanoarrow.c:186:42: warning: Call to 'malloc' has an allocation size of 0 bytes [clang-analyzer-optin.portability.UnixAPI]
void* ArrowMalloc(int64_t size) { return malloc(size); }
                                         ^
/home/runner/work/arrow-adbc/arrow-adbc/c/vendor/nanoarrow/nanoarrow.c:3334:7: note: Calling 'PrivateArrowArrayViewInitFromSchema'
      ArrowArrayViewInitFromSchema(&array_view, &private_data->schema, error));
      ^
/home/runner/work/arrow-adbc/arrow-adbc/c/vendor/nanoarrow/nanoarrow.h:1105:3: note: expanded from macro 'ArrowArrayViewInitFromSchema'
  NANOARROW_SYMBOL(NANOARROW_NAMESPACE, ArrowArrayViewInitFromSchema)
  ^
/home/runner/work/arrow-adbc/arrow-adbc/c/vendor/nanoarrow/nanoarrow.h:1025:32: note: expanded from macro 'NANOARROW_SYMBOL'
#define NANOARROW_SYMBOL(A, B) NANOARROW_CAT(A, B)
                               ^
/home/runner/work/arrow-adbc/arrow-adbc/c/vendor/nanoarrow/nanoarrow.h:1024:29: note: expanded from macro 'NANOARROW_CAT'
#define NANOARROW_CAT(A, B) A##B
                            ^
note: expanded from here
/home/runner/work/arrow-adbc/arrow-adbc/c/vendor/nanoarrow/nanoarrow.h:301:83: note: expanded from macro 'NANOARROW_RETURN_NOT_OK'
  _NANOARROW_RETURN_NOT_OK_IMPL(_NANOARROW_MAKE_NAME(errno_status_, __COUNTER__), EXPR)
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~
/home/runner/work/arrow-adbc/arrow-adbc/c/vendor/nanoarrow/nanoarrow.h:173:23: note: expanded from macro '_NANOARROW_RETURN_NOT_OK_IMPL'
    const int NAME = (EXPR);                      \
                      ^~~~
/home/runner/work/arrow-adbc/arrow-adbc/c/vendor/nanoarrow/nanoarrow.c:2545:7: note: 'result' is equal to NANOARROW_OK
  if (result != NANOARROW_OK) {
      ^~~~~~
/home/runner/work/arrow-adbc/arrow-adbc/c/vendor/nanoarrow/nanoarrow.c:2545:3: note: Taking false branch
  if (result != NANOARROW_OK) {
  ^
/home/runner/work/arrow-adbc/arrow-adbc/c/vendor/nanoarrow/nanoarrow.c:2552:12: note: Calling 'PrivateArrowArrayViewAllocateChildren'
  result = ArrowArrayViewAllocateChildren(array_view, schema->n_children);
           ^
/home/runner/work/arrow-adbc/arrow-adbc/c/vendor/nanoarrow/nanoarrow.h:1107:3: note: expanded from macro 'ArrowArrayViewAllocateChildren'
  NANOARROW_SYMBOL(NANOARROW_NAMESPACE, ArrowArrayViewAllocateChildren)
  ^
/home/runner/work/arrow-adbc/arrow-adbc/c/vendor/nanoarrow/nanoarrow.h:1025:32: note: expanded from macro 'NANOARROW_SYMBOL'
#define NANOARROW_SYMBOL(A, B) NANOARROW_CAT(A, B)
                               ^
/home/runner/work/arrow-adbc/arrow-adbc/c/vendor/nanoarrow/nanoarrow.h:1024:29: note: expanded from macro 'NANOARROW_CAT'
#define NANOARROW_CAT(A, B) A##B
                            ^
note: expanded from here
/home/runner/work/arrow-adbc/arrow-adbc/c/vendor/nanoarrow/nanoarrow.c:2497:19: note: Field 'children' is equal to NULL
  if (array_view->children != NULL) {
                  ^
/home/runner/work/arrow-adbc/arrow-adbc/c/vendor/nanoarrow/nanoarrow.c:2497:3: note: Taking false branch
  if (array_view->children != NULL) {
  ^
/home/runner/work/arrow-adbc/arrow-adbc/c/vendor/nanoarrow/nanoarrow.c:2502:44: note: Passing the value 0 via 1st parameter 'size'
      (struct ArrowArrayView**)ArrowMalloc(n_children * sizeof(struct ArrowArrayView*));
                                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/runner/work/arrow-adbc/arrow-adbc/c/vendor/nanoarrow/nanoarrow.c:2502:32: note: Calling 'PrivateArrowMalloc'
      (struct ArrowArrayView**)ArrowMalloc(n_children * sizeof(struct ArrowArrayView*));
                               ^
/home/runner/work/arrow-adbc/arrow-adbc/c/vendor/nanoarrow/nanoarrow.h:1030:21: note: expanded from macro 'ArrowMalloc'
#define ArrowMalloc NANOARROW_SYMBOL(NANOARROW_NAMESPACE, ArrowMalloc)
                    ^
/home/runner/work/arrow-adbc/arrow-adbc/c/vendor/nanoarrow/nanoarrow.h:1025:32: note: expanded from macro 'NANOARROW_SYMBOL'
#define NANOARROW_SYMBOL(A, B) NANOARROW_CAT(A, B)
                               ^
/home/runner/work/arrow-adbc/arrow-adbc/c/vendor/nanoarrow/nanoarrow.h:1024:29: note: expanded from macro 'NANOARROW_CAT'
#define NANOARROW_CAT(A, B) A##B
                            ^
note: expanded from here
/home/runner/work/arrow-adbc/arrow-adbc/c/vendor/nanoarrow/nanoarrow.c:186:42: note: Call to 'malloc' has an allocation size of 0 bytes
void* ArrowMalloc(int64_t size) { return malloc(size); }
                                         ^      ~~~~
/home/runner/work/arrow-adbc/arrow-adbc/c/vendor/nanoarrow/nanoarrow.c:2763:13: warning: Value stored to 'min_buffer_size_bytes' during its initialization is never read [clang-analyzer-deadcode.DeadStores]
    int64_t min_buffer_size_bytes = array_view->buffer_views[i].size_bytes + 1;
            ^~~~~~~~~~~~~~~~~~~~~   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/runner/work/arrow-adbc/arrow-adbc/c/vendor/nanoarrow/nanoarrow.c:2763:13: note: Value stored to 'min_buffer_size_bytes' during its initialization is never read
    int64_t min_buffer_size_bytes = array_view->buffer_views[i].size_bytes + 1;
            ^~~~~~~~~~~~~~~~~~~~~   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/runner/work/arrow-adbc/arrow-adbc/c/vendor/nanoarrow/nanoarrow.c:3303:7: warning: Potential leak of memory pointed to by 'private_data' [clang-analyzer-unix.Malloc]
      ArrowBasicArrayStreamRelease(array_stream);
      ^
/home/runner/work/arrow-adbc/arrow-adbc/c/vendor/nanoarrow/nanoarrow.c:3287:40: note: Calling 'PrivateArrowMalloc'
      (struct BasicArrayStreamPrivate*)ArrowMalloc(
                                       ^
/home/runner/work/arrow-adbc/arrow-adbc/c/vendor/nanoarrow/nanoarrow.h:1030:21: note: expanded from macro 'ArrowMalloc'
#define ArrowMalloc NANOARROW_SYMBOL(NANOARROW_NAMESPACE, ArrowMalloc)
                    ^
/home/runner/work/arrow-adbc/arrow-adbc/c/vendor/nanoarrow/nanoarrow.h:1025:32: note: expanded from macro 'NANOARROW_SYMBOL'
#define NANOARROW_SYMBOL(A, B) NANOARROW_CAT(A, B)
                               ^
/home/runner/work/arrow-adbc/arrow-adbc/c/vendor/nanoarrow/nanoarrow.h:1024:29: note: expanded from macro 'NANOARROW_CAT'
#define NANOARROW_CAT(A, B) A##B
                            ^
note: expanded from here
/home/runner/work/arrow-adbc/arrow-adbc/c/vendor/nanoarrow/nanoarrow.c:186:42: note: Memory is allocated
void* ArrowMalloc(int64_t size) { return malloc(size); }
                                         ^~~~~~~~~~~~
/home/runner/work/arrow-adbc/arrow-adbc/c/vendor/nanoarrow/nanoarrow.c:3287:40: note: Returned allocated memory
      (struct BasicArrayStreamPrivate*)ArrowMalloc(
                                       ^
/home/runner/work/arrow-adbc/arrow-adbc/c/vendor/nanoarrow/nanoarrow.h:1030:21: note: expanded from macro 'ArrowMalloc'
#define ArrowMalloc NANOARROW_SYMBOL(NANOARROW_NAMESPACE, ArrowMalloc)
                    ^
/home/runner/work/arrow-adbc/arrow-adbc/c/vendor/nanoarrow/nanoarrow.h:1025:32: note: expanded from macro 'NANOARROW_SYMBOL'
#define NANOARROW_SYMBOL(A, B) NANOARROW_CAT(A, B)
                               ^
/home/runner/work/arrow-adbc/arrow-adbc/c/vendor/nanoarrow/nanoarrow.h:1024:29: note: expanded from macro 'NANOARROW_CAT'
#define NANOARROW_CAT(A, B) A##B
                            ^
note: expanded from here
/home/runner/work/arrow-adbc/arrow-adbc/c/vendor/nanoarrow/nanoarrow.c:3289:7: note: Assuming 'private_data' is not equal to NULL
  if (private_data == NULL) {
      ^~~~~~~~~~~~~~~~~~~~
/home/runner/work/arrow-adbc/arrow-adbc/c/vendor/nanoarrow/nanoarrow.c:3289:3: note: Taking false branch
  if (private_data == NULL) {
  ^
/home/runner/work/arrow-adbc/arrow-adbc/c/vendor/nanoarrow/nanoarrow.c:3299:7: note: Assuming 'n_arrays' is > 0
  if (n_arrays > 0) {
      ^~~~~~~~~~~~
/home/runner/work/arrow-adbc/arrow-adbc/c/vendor/nanoarrow/nanoarrow.c:3299:3: note: Taking true branch
  if (n_arrays > 0) {
  ^
/home/runner/work/arrow-adbc/arrow-adbc/c/vendor/nanoarrow/nanoarrow.c:3302:9: note: Assuming field 'arrays' is equal to NULL
    if (private_data->arrays == NULL) {
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/runner/work/arrow-adbc/arrow-adbc/c/vendor/nanoarrow/nanoarrow.c:3302:5: note: Taking true branch
    if (private_data->arrays == NULL) {
    ^
/home/runner/work/arrow-adbc/arrow-adbc/c/vendor/nanoarrow/nanoarrow.c:3303:7: note: Potential leak of memory pointed to by 'private_data'
      ArrowBasicArrayStreamRelease(array_stream);
      ^
Applying fixes ...

https://github.com/apache/arrow-adbc/actions/runs/9590445640/job/26445818971?pr=1928#step:7:344

vyasr commented 1 day ago

FWIW this issue just popped up in cudf as well. In my case, the tracebacks all point back to inline_buffer.h, specifically

  /home/coder/cudf/cpp/build/conda/cuda-12.5/release/_deps/nanoarrow-src/src/nanoarrow/common/inline_buffer.h:350:8: warning: Dereference of null pointer (loaded from variable 'out') [clang-analyzer-core.  NullDereference]                                                                                                                                                                                            /home/coder/cudf/cpp/build/conda/cuda-12.5/release/_deps/nanoarrow-src/src/nanoarrow/common/inline_buffer.h:474:23: warning: Array access (from variable 'bits') results in a null pointer dereference [cl  ang-analyzer-core.NullDereference]                                                                                                                                                                        
  /home/coder/cudf/cpp/build/conda/cuda-12.5/release/_deps/nanoarrow-src/src/nanoarrow/common/inline_buffer.h:480:21: warning: Array access (from variable 'bits') results in a null pointer dereference [cl  ang-analyzer-core.NullDereference]                                                                                                                                                                        
  /home/coder/cudf/cpp/build/conda/cuda-12.5/release/_deps/nanoarrow-src/src/nanoarrow/common/inline_buffer.h:640:17: warning: Dereference of null pointer (loaded from variable 'out_cursor') [clang-analyz  er-core.NullDereference

I'm also having difficulty suppressing them in our runs. The recent release of clang-tidy 19 exposed the ability to exclude headers, which helps, but I'm running into cases specifically with nanoarrow where the exclusions don't work. I suspect that the issue comes from the extensive use of macros and clang-tidy having special logic around the way macros are handled in the call stack, but I'm not sure about that.

paleolimbot commented 1 day ago

I'll take another look today!

WillAyd commented 1 day ago

Meson has built-in support for clang-tidy, so if we wanted to I think can just add a .clang-tidy configuration to the project root and run meson compile clang-tidy to check

paleolimbot commented 1 day ago

I'll plug through fixing all of these tomorrow, but I'll also share here the workaround from ADBC which may be helpful here which involves some magic with jq:

https://github.com/apache/arrow-adbc/blob/8c6243f1058529c47285eabdef6bb83a118f6040/ci/scripts/cpp_clang_tidy.sh#L67-L74

vyasr commented 1 day ago

Ah yeah I'm doing something similar to prevent clang-tidy from checking nanoarrow sources, but I still see some errors in nanoarrow headers because clang-tidy will check all headers included in source files that are checked. In theory those should be possible to filter, and I've been able to filter all other vendored/downloaded at build-time headers, but not nanoarrow's. I suspect the issue is specifically with how macros are being handled in nested contexts and that defeating the filtering capabilities of clang-tidy.