Open ianmccul opened 2 weeks ago
I have put a sketch of what a std::variant
-based version of the Storage
, Tensor
, Scalar
etc classes might look like on: https://godbolt.org/z/5344rccWr
The top part is just some helper functions, the actual code starts around line 87:
using dtypes = std::variant<std::complex<double>, std::complex<float>, double, float, int>;
which is easily extensible to whatever types you want. It isn't possible to use exactly the same set of types, since void
isn't allowed in a std::variant
, but if some other dummy type was used instead (cytnx_void
perhaps?) then it would be possible to get Tensor::dtype()
to return the identical index to what it does currently, so it should be possible to drop it in and progressively refactor the functions that are currently called via function pointer tables later on.
The example code at the end simulates some simple operations,
int main() {
// Create a Scalar and Tensor
Scalar s1(3.0); // Scalar holding a double
Tensor t1;
t1.storage = Storage<double>{}; // Tensor holding a Storage<double>
// Multiply Tensor by Scalar
Tensor t2 = Mul(t1, s1); // This should call Mul(Storage<double>, double)
// Multiply Tensor by another scalar type (int for example)
Tensor t3 = Mul(t1, 2); // This should call Mul(Storage<double>, int)
Tensor t4;
t4.storage = Storage<std::complex<float>>{};
Tensor t5 = Mul(t4, s1); // product of a complex<float> tensor and double should produce a tensor of complex<double>
}
It is all fairly type-safe. As an example, during testing I added the example for multiplication of complex<float>
without adding complex<float>
to the dtype variant first, and got an error:
> 203:46: error: no match for ‘operator=’ (operand types are ‘Tensor::StorageType’ {aka ‘std::variant<Storage<std::complex<double> >, Storage<double>, Storage<float>, Storage<int> >’} and ‘Storage<std::complex<float> >’)
> 203 | t4.storage = Storage<std::complex<float>>{};
It is possible to construct a Storage<T>
of any type, but you can't put it into a Tensor
unless it is a supported type.
A backward of the new and current design of Scalar
is that Scalar
cannot be used while iterating a container which is Storage
and Tensor
. Because Scalar
cannot be a pointer or a reference of the element in the container.
using dtypes = std::variant<std::complex<double>, std::complex<float>, double, float, int>;
struct Scalar {
dtypes _elem; // or shared_ptr<dtypes> if you want shared reference semantics on copying
template <typename T>
Scalar(T const& x) : _elem(x) {}
template <typename T>
Scalar& operator=(T const& x) { _elem = x; return *this; }
// etc
};
Besides using invariant
, do we consider just templatizing everything? For example, if Tensor
is also templatized then we may not need to use visit
in this function
// copy from @ianmccul 's godbolt
template <typename T>
Tensor Mul(const Tensor& x, T y)
{
return std::visit([&](auto&& x) -> Tensor {
Tensor result;
result.storage = Mul(x, y);
return result;
}, x.storage);
}
In C++ code you wouldn't want to use a Scalar for iterating anyway, because it needs to do the indirection over dtype for every element. Instead you would write a template function and iterate over the element type T of Storage
Also, iterating over a Tensor should be strongly discouraged; it is a sign that the code won't work properly with eg non-abelian symmetries. Also what happens if the data is stored somewhere else (GPU, distributed via MPI?).
That said, it would be not difficult to have something similar to Scalar that can reference an element of a Tensor. Since Tensor is stored by reference the easiest way would be to just get a reference-copy of the Tensor and a T to the element that you want (eg as a std::variant<T> for each T in dtypes; there is a metafunction in the code to produce such types). Is there a need for this?
On the question of templatizing everything, the python interface cannot be templatized, so at some level there needs to be type erasure. I guess that could be a PyObject
, keeping everything in C++ as a template, but then the C++ interface would be completely different from the Python interface. It would be useful though to have a type-safe version of Tensor. Eg, TypedTensor<T>
, that actually defines tensor operations, contractions, etc and Tensorstd::variant
of the different possible TypedTensor<T>
.
I agree with this issue and also suggest we make Tensor
typed.
For iterating a tensor, we do not have to worry about the physical location storing the data and symmetries. The data in Storage
may be stored at different locations (GPU or elsewhere). We still need to design iterators for Storage
. At the same, the location where the data is stored does not restrict us from designing iterators for Tensor
. And then, the symmetry properties are bound with UniTensor
instead of Tensor
. Tensor
is just a container with shapes. In this context, it's fine to design an iterator for Tensor
.
Although, the job needing iterators of Tensor
can be done by using the iterators of Storage
, which is a 1-D array, the iterators of Tensor
let the user be able to iterate the data with the shape of the Tensor
. For example, the iterators of Tensor
will make "doubling one row of a matrix" more straightforward.
The necessity to make Scalar
as a value pointed by the iterators of Tensor
or Storage
depends on the function interfaces of the linear algebra. If the functions consume type-less Tensor
or Storage
or we allow users to iterate over type-less Tensor
or Storage
, we should have a class representing the type-less scalar.
Sure, it would be very easy to make a typed version of Tensor
, and then the untyped version can use a variant to store the options (the actual type would be given by the alias VariantMap_t<dtypes, Tensor>
). I guess there is a question on the naming - in C++ code there probably isn't much reason to use a type-erased UntypedTensor
, since in practice you would either know the type, or if you don't know the type then you could make it a template. But there surely is a use in the Python interface for a dense Tensor
. So perhaps Tensor
should remain as the type-erased version, and have a new type for the template? BasicTensor<T>
perhaps?
And it probably makes sense for UniTensor to be a template as well, right? It needs to contain a bunch of tensor blocks, that all need to be the same type. Is there anything now that forces a UniTensor
to contain all the same dtype
in each block? Is there a use case for them to be different?
For creating typed Tensor
and typed Storage
, I published the proposal at #500.
I am confused. Whats wrong with the iterator? We have a Proxy class that was design to do that. Its carrying a pointer so that the lval and rval get/set item can be done with single interface to support both the following semantic:
x[3] = 0.5;
y = x[3];
You don't use Scalar to do iteration
Another reason for this proxy is that one might access elements from a BlockUniTensor that does not exists, then the proxy allows you to check that.
And it probably makes sense for UniTensor to be a template as well, right? It needs to contain a bunch of tensor blocks, that all need to be the same type. Is there anything now that forces a
UniTensor
to contain all the samedtype
in each block? Is there a use case for them to be different?
There is a need to have different blocks to be on different devices (which its also another "type"). Another potential use case is mix precision/ hybrid precision on different blocks. Its probably better to keep this flexibility than restrict all blocks to be same type/device
On the question of templatizing everything, the python interface cannot be templatized, so at some level there needs to be type erasure. I guess that could be a
PyObject
, keeping everything in C++ as a template, but then the C++ interface would be completely different from the Python interface. It would be useful though to have a type-safe version of Tensor. Eg,TypedTensor<T>
, that actually defines tensor operations, contractions, etc and Tensor contains astd::variant
of the different possibleTypedTensor<T>
.
wrapping and unwrapping PyObject is a very expensive operation, so usually you want to avoid it.
@kaihsin said:
I am confused. Whats wrong with the iterator? We have a Proxy class that was design to do that. Its carrying a pointer so that the lval and rval get/set item can be done with single interface to support both the following semantic:
x[3] = 0.5; y = x[3];
You don't use Scalar to do iteration
I guess @IvanaGyro was talking about a (hypothetical) typed version of an iterator? TProxy
is very expensive, you wouldn't want to use it in performance critical code. I tried a simple benchmark:
#include "cytnx.hpp"
#include <iostream>
#include <vector>
#include <chrono>
#include <cassert>
using namespace cytnx;
using namespace std;
class SimpleMatrix {
public:
SimpleMatrix(int rows, int cols) : rows_(rows), cols_(cols), storage_(rows*cols) {}
double& operator()(int r, int c) { assert(r >= 0 && r < rows_ && c >= 0 && c <= cols_); return storage_[r*cols_ + c]; }
double operator()(int r, int c) const { assert(r >= 0 && r < rows_ && c >= 0 && c <= cols_); return storage_[r*cols_ + c]; }
private:
int rows_;
int cols_;
std::vector<double> storage_;
};
int const N = 1000;
int const num_iterations = 10;
int main() {
// Benchmark cytnx::Tensor
double tensor_total_time = 0.0;
for (int iter = 0; iter < num_iterations; ++iter) {
auto start = chrono::high_resolution_clock::now();
Tensor A({N, N}, Type.Double);
for (int i = 0; i < N; ++i) {
for (int j = 0; j < N; ++j)
{
A(i,j) = i-j;
}
}
auto end = chrono::high_resolution_clock::now();
chrono::duration<double> duration = end - start;
tensor_total_time += duration.count();
}
double tensor_avg_time = tensor_total_time / num_iterations;
cout << "Average time for Cytnx Tensor: " << tensor_avg_time << " seconds" << endl;
// Benchmark SimpleMatrix
double simple_matrix_total_time = 0.0;
for (int iter = 0; iter < num_iterations; ++iter) {
auto start = chrono::high_resolution_clock::now();
SimpleMatrix B(N, N);
for (int i = 0; i < N; ++i) {
for (int j = 0; j < N; ++j)
{
B(i,j) = i-j;
}
}
auto end = chrono::high_resolution_clock::now();
chrono::duration<double> duration = end - start;
simple_matrix_total_time += duration.count();
}
double simple_matrix_avg_time = simple_matrix_total_time / num_iterations;
cout << "Average time for SimpleMatrix: " << simple_matrix_avg_time << " seconds" << endl;
}
With no optimizations (g++ -O0 ./bench.cpp
) this gives
Average time for Cytnx Tensor: 0.674874 seconds Average time for SimpleMatrix: 0.00317587 seconds
With optimizations (g++ -O3 ./bench.cpp
):
Average time for Cytnx Tensor: 0.229474 seconds Average time for SimpleMatrix: 0.000544618 seconds
Yes. I agree. Another reason for that slowdown is the accessor overhead. I believe that is another hotspot.
@kaihsin said:
@ianmccul said:
And it probably makes sense for UniTensor to be a template as well, right? It needs to contain a bunch of tensor blocks, that all need to be the same type. Is there anything now that forces a
UniTensor
to contain all the samedtype
in each block?
To answer my own question, yes, https://github.com/Cytnx-dev/Cytnx/blob/fa4596e1620cc304f698286905818a2118b2bd8e/include/UniTensor.hpp#L655
constrains a UniTensor to only hold blocks of the same dtype
.
Is there a use case for them to be different?
There is a need to have different blocks to be on different devices (which its also another "type"). Another potential use case is mix precision/ hybrid precision on different blocks. Its probably better to keep this flexibility than restrict all blocks to be same type/device
I think there is some confusion, I was referring to the dtype
, as the enumeration of the different concrete types that a tensor can hold. The device()
parameter is a different quantity, and as far as I can tell it is completely independent of the dtype
. (It is also a possible candidate to store in some kind of variant, but that is a separate topic so I will not mention that further here!).
I cannot think of a situation where it is useful to have different types for each block. It would add a lot of complexity, and operations that need to return a common type of each block (norm, trace, etc) would need to iterate through the blocks and find a common type. And there would be issues, for example if you have a UniTensor
that contains some blocks that are float
and some blocks that are double
, and you calculate the norm. To what precision should the norm be calculated? Clearly the overall norm should be a double
, but should the blocks that are float
be promoted to double
first? That would make a difference to the result!
But the point is, with good generic design it would be possible to get both possibilities, if you wanted. For an example sketch:
// Start with a concrete tensor, parametrized by the data type
template <typename T>
class TypedTensor {
using value_type = T;
// uses Storage<T> as the backing store
};
// implement functions for TypedTensor<T>
double norm(TypedTensor<double> const& t) {
return ...
}
// or as a template, where possible
template <typename T>
auto norm(TypedTensor<T> const& t) -> decltype(norm(T())) {
return ...
}
// now a Tensor type, that can store a TypedTensor<T> for any type T in dtypes
class Tensor {
// helper alias for std::variant<TypedTensor<float>, TypedTensor<double>, ...>
using TensorVariant = VariantMap_t<TypedTensor, dtypes>;
using value_type = Scalar;
TensorVariant data_;
};
// implement Tensor operations by visitor pattern on t.data_
Scalar norm(Tensor& t) {
return std::visit(norm, t.data_);
}
// Now we move onto a UniTensor, adding symmetry....
// implementation helper is a templated UniTensor, templated on the block type
template <typename BlockType>
class BasicUniTensor
{
using value_type = typename BlockType::value_type;
std::vector<BlockType> blocks_;
}
// implementation of functions for UniTensor<BlockType>. In most cases, these should be possible
// to implement as a template function.
template <typename BlockType>
auto norm(BasicUniTensor<BlockType> const& t) -> typename BlockType::value_type
{
// calculate norm of each block ...
}
// alias for a TypedUniTensor, with the same type on each block
template <typename T>
using TypedUniTensor = BasicUniTensor<Tensor<T>>;
// The current implementation of UniTensor is roughly equivalent to this:
class UniTensor
{
// an alias for std::variant<TypedUniTensor<float>, TypedUniTensor<double>, ...>
using UniTensorVariant = VariantMap_t<TypedUniTensor, dtypes>;
using value_type = Scalar;
UniTensorVariant data_;
};
// dispatch operations on a UniTensor using visitor pattern
Scalar norm(UniTensor const& t)
{
return std::visit(norm, t.data_);
}
// Alternatively, we could have a version of UniTensor that allows each block to have a different type.
// This is simply a typedef!
using BlockTypedUniTensor = BasicUniTensor<Tensor>;
// The following should "just work", given complete implementations of the above
int main()
{
UniTensor t1;
std::cout << norm(t1) << '\n'; // calls norm(UniTensor), then dispatches to the norm for whatever type
TypedUniTensor<double> t2;
std::cout << norm(t2) << '\n'; // calls norm(BasicUniTensor<TypedTensor<double>>)
BlockTypedUniTensor t3;
std::cout << norm(t3) << '\n'; // calls norm(BasicUniTensor<Tensor>)
}
This issue is to discuss issues and possible improvements to enhance the type safety of the C++ code, and thereby increase the maintainability.
The main issue that I can see is the int
dtype
. This is used as a index into arrays of function pointers, and occasionally there are largeswitch
statements to select amongst the various possible types. This has the effect of moving all of the type checking to runtime, and the compiler is not able to detect a large class of potential bugs. It is rather fragile (eg if you changed the order ofFloat
andInt64
in the enumeration, then it will break the library in very hard-to-detect ways). Extending the supported types (eg with a float128 or a bigint) would be very difficult, and require intrusive changes to many files. If two such extensions were done independently, then using both additions at the same time would not be possible.