Type safety of the C++ code

ianmccul commented 2 weeks ago

This issue is to discuss issues and possible improvements to enhance the type safety of the C++ code, and thereby increase the maintainability.

The main issue that I can see is the int dtype. This is used as a index into arrays of function pointers, and occasionally there are large switch statements to select amongst the various possible types. This has the effect of moving all of the type checking to runtime, and the compiler is not able to detect a large class of potential bugs. It is rather fragile (eg if you changed the order of Float and Int64 in the enumeration, then it will break the library in very hard-to-detect ways). Extending the supported types (eg with a float128 or a bigint) would be very difficult, and require intrusive changes to many files. If two such extensions were done independently, then using both additions at the same time would not be possible.

ianmccul commented 2 weeks ago

I have put a sketch of what a std::variant-based version of the Storage, Tensor, Scalar etc classes might look like on: https://godbolt.org/z/5344rccWr

The top part is just some helper functions, the actual code starts around line 87:

using dtypes = std::variant<std::complex<double>, std::complex<float>, double, float, int>;

which is easily extensible to whatever types you want. It isn't possible to use exactly the same set of types, since void isn't allowed in a std::variant, but if some other dummy type was used instead (cytnx_void perhaps?) then it would be possible to get Tensor::dtype() to return the identical index to what it does currently, so it should be possible to drop it in and progressively refactor the functions that are currently called via function pointer tables later on.

The example code at the end simulates some simple operations,

int main() {
   // Create a Scalar and Tensor
   Scalar s1(3.0);  // Scalar holding a double
   Tensor t1;
   t1.storage = Storage<double>{};  // Tensor holding a Storage<double>

   // Multiply Tensor by Scalar
   Tensor t2 = Mul(t1, s1);  // This should call Mul(Storage<double>, double)

   // Multiply Tensor by another scalar type (int for example)
   Tensor t3 = Mul(t1, 2);  // This should call Mul(Storage<double>, int)

   Tensor t4;
   t4.storage = Storage<std::complex<float>>{};

   Tensor t5 = Mul(t4, s1);  // product of a complex<float> tensor and double should produce a tensor of complex<double>
}

It is all fairly type-safe. As an example, during testing I added the example for multiplication of complex<float> without adding complex<float> to the dtype variant first, and got an error:

> 203:46: error: no match for ‘operator=’ (operand types are ‘Tensor::StorageType’ {aka ‘std::variant<Storage<std::complex<double> >, Storage<double>, Storage<float>, Storage<int> >’} and ‘Storage<std::complex<float> >’)
>   203 |    t4.storage = Storage<std::complex<float>>{};

It is possible to construct a Storage<T> of any type, but you can't put it into a Tensor unless it is a supported type.

IvanaGyro commented 2 weeks ago

A backward of the new and current design of Scalar is that Scalar cannot be used while iterating a container which is Storage and Tensor. Because Scalar cannot be a pointer or a reference of the element in the container.

using dtypes = std::variant<std::complex<double>, std::complex<float>, double, float, int>;

struct Scalar {
   dtypes _elem;  // or shared_ptr<dtypes> if you want shared reference semantics on copying

   template <typename T>
   Scalar(T const& x) : _elem(x) {}

   template <typename T>
   Scalar& operator=(T const& x) { _elem = x; return *this; }

   // etc

};

Besides using invariant, do we consider just templatizing everything? For example, if Tensor is also templatized then we may not need to use visit in this function

// copy from @ianmccul 's godbolt
template <typename T>
Tensor Mul(const Tensor& x, T y)
{
   return std::visit([&](auto&& x) -> Tensor {
      Tensor result;
      result.storage = Mul(x, y);
      return result;
   }, x.storage);
}

ianmccul commented 2 weeks ago

In C++ code you wouldn't want to use a Scalar for iterating anyway, because it needs to do the indirection over dtype for every element. Instead you would write a template function and iterate over the element type T of Storage. I don't know enough about python bindings to say what you would do in python code iterating over a Tensor, but I suspect that you want to encode the type into the iterator in some way, rather than the element type.

Also, iterating over a Tensor should be strongly discouraged; it is a sign that the code won't work properly with eg non-abelian symmetries. Also what happens if the data is stored somewhere else (GPU, distributed via MPI?).

That said, it would be not difficult to have something similar to Scalar that can reference an element of a Tensor. Since Tensor is stored by reference the easiest way would be to just get a reference-copy of the Tensor and a T to the element that you want (eg as a std::variant<T> for each T in dtypes; there is a metafunction in the code to produce such types). Is there a need for this?

ianmccul commented 2 weeks ago

On the question of templatizing everything, the python interface cannot be templatized, so at some level there needs to be type erasure. I guess that could be a PyObject, keeping everything in C++ as a template, but then the C++ interface would be completely different from the Python interface. It would be useful though to have a type-safe version of Tensor. Eg, TypedTensor<T>, that actually defines tensor operations, contractions, etc and Tensor contains a std::variant of the different possible TypedTensor<T>.

IvanaGyro commented 2 weeks ago

I agree with this issue and also suggest we make Tensor typed.

For iterating a tensor, we do not have to worry about the physical location storing the data and symmetries. The data in Storage may be stored at different locations (GPU or elsewhere). We still need to design iterators for Storage. At the same, the location where the data is stored does not restrict us from designing iterators for Tensor. And then, the symmetry properties are bound with UniTensor instead of Tensor. Tensor is just a container with shapes. In this context, it's fine to design an iterator for Tensor.

Although, the job needing iterators of Tensor can be done by using the iterators of Storage, which is a 1-D array, the iterators of Tensor let the user be able to iterate the data with the shape of the Tensor. For example, the iterators of Tensor will make "doubling one row of a matrix" more straightforward.

The necessity to make Scalar as a value pointed by the iterators of Tensor or Storage depends on the function interfaces of the linear algebra. If the functions consume type-less Tensor or Storage or we allow users to iterate over type-less Tensor or Storage, we should have a class representing the type-less scalar.

ianmccul commented 2 weeks ago

Sure, it would be very easy to make a typed version of Tensor, and then the untyped version can use a variant to store the options (the actual type would be given by the alias VariantMap_t<dtypes, Tensor>). I guess there is a question on the naming - in C++ code there probably isn't much reason to use a type-erased UntypedTensor, since in practice you would either know the type, or if you don't know the type then you could make it a template. But there surely is a use in the Python interface for a dense Tensor. So perhaps Tensor should remain as the type-erased version, and have a new type for the template? BasicTensor<T> perhaps?

ianmccul commented 2 weeks ago

And it probably makes sense for UniTensor to be a template as well, right? It needs to contain a bunch of tensor blocks, that all need to be the same type. Is there anything now that forces a UniTensor to contain all the same dtype in each block? Is there a use case for them to be different?

IvanaGyro commented 2 weeks ago

For creating typed Tensor and typed Storage, I published the proposal at #500.

kaihsin commented 2 weeks ago

I am confused. Whats wrong with the iterator? We have a Proxy class that was design to do that. Its carrying a pointer so that the lval and rval get/set item can be done with single interface to support both the following semantic:

x[3] = 0.5;
y = x[3];

You don't use Scalar to do iteration

kaihsin commented 2 weeks ago

Another reason for this proxy is that one might access elements from a BlockUniTensor that does not exists, then the proxy allows you to check that.

kaihsin commented 2 weeks ago

And it probably makes sense for UniTensor to be a template as well, right? It needs to contain a bunch of tensor blocks, that all need to be the same type. Is there anything now that forces a UniTensor to contain all the same dtype in each block? Is there a use case for them to be different?

There is a need to have different blocks to be on different devices (which its also another "type"). Another potential use case is mix precision/ hybrid precision on different blocks. Its probably better to keep this flexibility than restrict all blocks to be same type/device

kaihsin commented 2 weeks ago

On the question of templatizing everything, the python interface cannot be templatized, so at some level there needs to be type erasure. I guess that could be a PyObject, keeping everything in C++ as a template, but then the C++ interface would be completely different from the Python interface. It would be useful though to have a type-safe version of Tensor. Eg, TypedTensor<T>, that actually defines tensor operations, contractions, etc and Tensor contains a std::variant of the different possible TypedTensor<T>.

wrapping and unwrapping PyObject is a very expensive operation, so usually you want to avoid it.

ianmccul commented 2 weeks ago

@kaihsin said:

I am confused. Whats wrong with the iterator? We have a Proxy class that was design to do that. Its carrying a pointer so that the lval and rval get/set item can be done with single interface to support both the following semantic:
x[3] = 0.5;
y = x[3]; 
You don't use Scalar to do iteration

I guess @IvanaGyro was talking about a (hypothetical) typed version of an iterator? TProxy is very expensive, you wouldn't want to use it in performance critical code. I tried a simple benchmark:

#include "cytnx.hpp"
#include <iostream>
#include <vector>
#include <chrono>
#include <cassert>

using namespace cytnx;
using namespace std;

class SimpleMatrix {
   public:
      SimpleMatrix(int rows, int cols) : rows_(rows), cols_(cols), storage_(rows*cols) {}

      double& operator()(int r, int c) { assert(r >= 0 && r < rows_ && c >= 0 && c <= cols_); return storage_[r*cols_ + c]; }
      double operator()(int r, int c) const { assert(r >= 0 && r < rows_ && c >= 0 && c <= cols_); return storage_[r*cols_ + c]; }

   private:
      int rows_;
      int cols_;
      std::vector<double> storage_;
};

int const N = 1000;
int const num_iterations = 10;

int main() {
   // Benchmark cytnx::Tensor
   double tensor_total_time = 0.0;
   for (int iter = 0; iter < num_iterations; ++iter) {
      auto start = chrono::high_resolution_clock::now();

      Tensor A({N, N}, Type.Double);
      for (int i = 0; i < N; ++i) {
         for (int j = 0; j < N; ++j)
         {
            A(i,j) = i-j;
         }
      }

      auto end = chrono::high_resolution_clock::now();
      chrono::duration<double> duration = end - start;
      tensor_total_time += duration.count();
   }

   double tensor_avg_time = tensor_total_time / num_iterations;
   cout << "Average time for Cytnx Tensor: " << tensor_avg_time << " seconds" << endl;

   // Benchmark SimpleMatrix
   double simple_matrix_total_time = 0.0;
   for (int iter = 0; iter < num_iterations; ++iter) {
      auto start = chrono::high_resolution_clock::now();

      SimpleMatrix B(N, N);
      for (int i = 0; i < N; ++i) {
         for (int j = 0; j < N; ++j)
         {
            B(i,j) = i-j;
         }
      }

      auto end = chrono::high_resolution_clock::now();
      chrono::duration<double> duration = end - start;
      simple_matrix_total_time += duration.count();
   }
   double simple_matrix_avg_time = simple_matrix_total_time / num_iterations;
   cout << "Average time for SimpleMatrix: " << simple_matrix_avg_time << " seconds" << endl;
}

With no optimizations (g++ -O0 ./bench.cpp) this gives

Average time for Cytnx Tensor: 0.674874 seconds Average time for SimpleMatrix: 0.00317587 seconds

With optimizations (g++ -O3 ./bench.cpp):

Average time for Cytnx Tensor: 0.229474 seconds Average time for SimpleMatrix: 0.000544618 seconds

kaihsin commented 2 weeks ago

Yes. I agree. Another reason for that slowdown is the accessor overhead. I believe that is another hotspot.

ianmccul commented 2 weeks ago

@kaihsin said:

@ianmccul said:

And it probably makes sense for UniTensor to be a template as well, right? It needs to contain a bunch of tensor blocks, that all need to be the same type. Is there anything now that forces a UniTensor to contain all the same dtype in each block?

To answer my own question, yes, https://github.com/Cytnx-dev/Cytnx/blob/fa4596e1620cc304f698286905818a2118b2bd8e/include/UniTensor.hpp#L655 constrains a UniTensor to only hold blocks of the same dtype.

Is there a use case for them to be different?

There is a need to have different blocks to be on different devices (which its also another "type"). Another potential use case is mix precision/ hybrid precision on different blocks. Its probably better to keep this flexibility than restrict all blocks to be same type/device

I think there is some confusion, I was referring to the dtype, as the enumeration of the different concrete types that a tensor can hold. The device() parameter is a different quantity, and as far as I can tell it is completely independent of the dtype. (It is also a possible candidate to store in some kind of variant, but that is a separate topic so I will not mention that further here!).

I cannot think of a situation where it is useful to have different types for each block. It would add a lot of complexity, and operations that need to return a common type of each block (norm, trace, etc) would need to iterate through the blocks and find a common type. And there would be issues, for example if you have a UniTensor that contains some blocks that are float and some blocks that are double, and you calculate the norm. To what precision should the norm be calculated? Clearly the overall norm should be a double, but should the blocks that are float be promoted to double first? That would make a difference to the result!

But the point is, with good generic design it would be possible to get both possibilities, if you wanted. For an example sketch:


// Start with a concrete tensor, parametrized by the data type

template <typename T>
class TypedTensor {
   using value_type = T;
   // uses Storage<T> as the backing store
};

// implement functions for TypedTensor<T>
double norm(TypedTensor<double> const& t) {
    return ...
}

// or as a template, where possible
template <typename T>
auto norm(TypedTensor<T> const& t) -> decltype(norm(T())) {
    return ...
}

// now a Tensor type, that can store a TypedTensor<T> for any type T in dtypes

class Tensor {
   // helper alias for std::variant<TypedTensor<float>, TypedTensor<double>, ...>
   using TensorVariant =  VariantMap_t<TypedTensor, dtypes>;  

   using value_type = Scalar;
   TensorVariant data_;
};

// implement Tensor operations by visitor pattern on t.data_
Scalar norm(Tensor& t) {
   return std::visit(norm, t.data_);
}

// Now we move onto a UniTensor, adding symmetry....

// implementation helper is a templated UniTensor, templated on the block type
template <typename BlockType>
class BasicUniTensor
{
   using value_type = typename BlockType::value_type;
   std::vector<BlockType> blocks_;
}

// implementation of functions for UniTensor<BlockType>.  In most cases, these should be possible
// to implement as a template function.
template <typename BlockType>
auto norm(BasicUniTensor<BlockType> const& t) -> typename BlockType::value_type
{
   // calculate norm of each block ...
}

// alias for a TypedUniTensor, with the same type on each block
template <typename T>
using TypedUniTensor = BasicUniTensor<Tensor<T>>;

// The current implementation of UniTensor is roughly equivalent to this:
class UniTensor
{
   // an alias for std::variant<TypedUniTensor<float>, TypedUniTensor<double>, ...>
   using UniTensorVariant = VariantMap_t<TypedUniTensor, dtypes>;
   using value_type = Scalar;

   UniTensorVariant data_;
};

// dispatch operations on a UniTensor using visitor pattern
Scalar norm(UniTensor const& t)
{
   return std::visit(norm, t.data_);
}

// Alternatively, we could have a version of UniTensor that allows each block to have a different type.
// This is simply a typedef!
using BlockTypedUniTensor = BasicUniTensor<Tensor>;

// The following should "just work", given complete implementations of the above
int main()
{
   UniTensor t1;
   std::cout << norm(t1) << '\n';   // calls norm(UniTensor), then dispatches to the norm for whatever type

   TypedUniTensor<double> t2;
   std::cout << norm(t2) << '\n';  // calls norm(BasicUniTensor<TypedTensor<double>>)

   BlockTypedUniTensor t3;
   std::cout << norm(t3) << '\n';  // calls norm(BasicUniTensor<Tensor>)
}

Cytnx-dev / Cytnx

Type safety of the C++ code #496