apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
14.41k stars 3.5k forks source link

[C++] Review and sanitize const_cast usage #33628

Open pitrou opened 1 year ago

pitrou commented 1 year ago

Describe the enhancement requested

It seems some uses of const_cast have crept into the codebase. Some of them are legitimate, because of interacting with third-party APIs (usually C APIs) that don't specify const for arguments they won't mutate. Some of them however may be incorrect and dangerous.

Component(s)

C++

pitrou commented 1 year ago

@felipecrv @zeroshade @js8544 This can be a nice task for someone wanting to get more familiar with the codebase.

pitrou commented 1 year ago

Related: https://github.com/apache/arrow/issues/33629

pitrou commented 1 year ago

Here's an example that I just fixed: https://github.com/apache/arrow/pull/15182/commits/8089b4e641d96b3dbceae512d15535efd4f68a31#diff-10f9496a81a85a38fca5553486a667a49836574479f6b956839a9ae80d559167L639-R647

js8544 commented 1 year ago

This can be a nice task for someone wanting to get more familiar with the codebase.

Indeed! I am willing to take this.

js8544 commented 1 year ago

take

felipecrv commented 1 year ago

Scalar has a problematic const_cast that makes any caller of GetSharedPtr a potential violator of const-correctness.

  /// \brief EXPERIMENTAL Enable obtaining shared_ptr<Scalar> from a const
  /// Scalar& context.
  std::shared_ptr<Scalar> GetSharedPtr() const {
    return const_cast<Scalar*>(this)->shared_from_this();
  }

std::enable_shared_from_this shows how to do this properly:

      shared_ptr<_Tp>
      shared_from_this()
      { return shared_ptr<_Tp>(this->_M_weak_this); }

      shared_ptr<const _Tp>
      shared_from_this() const
      { return shared_ptr<const _Tp>(this->_M_weak_this); }
js8544 commented 1 year ago

Scalar has a problematic const_cast that makes any caller of GetSharedPtr a potential violator of const-correctness.

  /// \brief EXPERIMENTAL Enable obtaining shared_ptr<Scalar> from a const
  /// Scalar& context.
  std::shared_ptr<Scalar> GetSharedPtr() const {
    return const_cast<Scalar*>(this)->shared_from_this();
  }

std::enable_shared_from_this shows how to do this properly:

      shared_ptr<_Tp>
      shared_from_this()
      { return shared_ptr<_Tp>(this->_M_weak_this); }

      shared_ptr<const _Tp>
      shared_from_this() const
      { return shared_ptr<const _Tp>(this->_M_weak_this); }

I think this const_cast is justifiable though, since the const version of shared_from_this returns shared_ptr<const Scalar>. To fix this we would need to write something like a arrow::util::enable_shared_from_this that returns shared_ptr<_Tp> for const _Tp. Do you think it's worth the hassle? cc @felipecrv @pitrou

felipecrv commented 1 year ago

@js8544 no need to have our own enable_shared_from_this. Let me try to be more clear:

Legally, we can only get a shared_ptr<const Scalar> if we have a const Scalar & or const Scalar *. We should only be able to get a shared_ptr<Scalar> if we have a non-const Scalar & or Scalar *.

I understand that this utility was added because it's a hassle to have a mix of shared_ptr<Scalar> and shared_ptr<const Scalar> across the codebase, but if we want to be strict about const this is the way.

felipecrv commented 1 year ago

I propose we remove this utility and const_cast at the callsites if needed. That is better than silently dropping constness via an innocent-looking utility function. Instead of removing the function we could fix the const annotations and have both const and non-const versions.

js8544 commented 1 year ago

I think shared_ptr<const Scalar> would be very hard to use since all our interfaces expect shared_ptr<Scalar>

westonpace commented 1 year ago

I think @js8544 has a good point and so I'll expand on it a bit. Scalars are immutable, they don't even have any non-const methods (some subtypes have mutable-data but usage of that is rare).

The challenge is that we are very diligent about writing methods of the form Foo(const Scalar&) and not so diligent about writing things of the form Foo(const std::shared_ptr<const Scalar>&) or Foo(std::shared_ptr<const Scalar>).

I'm not sure the correct answer though.

felipecrv commented 1 year ago

@westonpace alright. If we don't use the shared_ptrs to mutate the wrapped Scalar pointer then the ergonomics of this matters more than being pedantic about constness.