Open manuschneider opened 3 months ago
There is. Originally some of the API are design for large Tensor. Meaning that if you don't need to support -1 for indexing like python, the C++ side can utilize full 64bit address indexing. the API that using int64 is only to make compatible with Python -1 indexing
I see. But perumte does not work with negative indices anyways in the current implementation, at least for BlockUniTensor
There is. Originally some of the API are design for large Tensor. Meaning that if you don't need to support -1 for indexing like python, the C++ side can utilize full 64bit address indexing. the API that using int64 is only to make compatible with Python -1 indexing
Can we do it at Pybind11 level to make C++ code consistent?
Here is a crop from the Google C++ style guide that we apply. In short, only use unsigned integers when representing a bit pattern and use size_t
and ptrdiff_t
when appropriate.
Unsigned integers are good for representing bitfields and modular arithmetic. Because of historical accident, the C++ standard also uses unsigned integers to represent the size of containers - many members of the standards body believe this to be a mistake, but it is effectively impossible to fix at this point. The fact that unsigned arithmetic doesn't model the behavior of a simple integer, but is instead defined by the standard to model modular arithmetic (wrapping around on overflow/underflow), means that a significant class of bugs cannot be diagnosed by the compiler. In other cases, the defined behavior impedes optimization.
That said, mixing signedness of integer types is responsible for an equally large class of problems. The best advice we can provide: try to use iterators and containers rather than pointers and sizes, try not to mix signedness, and try to avoid unsigned types (except for representing bitfields or modular arithmetic). Do not use an unsigned type merely to assert that a variable is non-negative.
I suggest use int
(or some other signed type) for everything. But note that size_t
is unsigned, so don't use that! use ssize_t
.
C++20 has std::ssize() function for containers, so it is safe to write eg
Container<T> v; for (auto i = std::ssize(v)-1; i > 0; --i) { ... }
The MPToolkit is currently C++14 and not ready to use C++20 yet (not enough compiler support), but meanwhile I'm just going to add some ssize() function myself.
In new code, my feeling is always use a signed integer, unless it is actually bit-flipping. Having a container using size_type
as a signed type will work fine, and if you mix it with standard containers it will harmlessly convert to an unsigned type. But always use std::ssize()
when getting the size of containers when it needs to work with std containers as well, and similarly wrap other existing uses of unsigned types.
See also: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1428r0.pdf
Bjarne also advocates for making all .size()
functions return a signed int: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1491r0.pdf
related issue: #167
When referring to positive numbers like index or block numbers, the data structure is either a
cytnx_int64
or acytnx_uint64
. For example,UniTensor.permute(std::vector<cytnx_int64>)
andUniTensor.at(std::vector<cytnx_uint64>)
. ForUniTensor.get_block()
both versions exist and the difference (if any) is not obvious to me.This is a bit confusing and inconstant. While it should not cause problems on the Python side, the mix leads to difficulties with C++ because the two types are not automatically converted, at least when combined in vectors.
Is there a reason for different conventions? Could everything be changed to one convention? How can this be done without breaking compatibility with existing code, tests, etc?