Refer to indices by cytnx_int64 or cytnx_uint64

manuschneider commented 3 months ago

When referring to positive numbers like index or block numbers, the data structure is either a cytnx_int64 or a cytnx_uint64. For example, UniTensor.permute(std::vector<cytnx_int64>) and UniTensor.at(std::vector<cytnx_uint64>). For UniTensor.get_block() both versions exist and the difference (if any) is not obvious to me.

This is a bit confusing and inconstant. While it should not cause problems on the Python side, the mix leads to difficulties with C++ because the two types are not automatically converted, at least when combined in vectors.

Is there a reason for different conventions? Could everything be changed to one convention? How can this be done without breaking compatibility with existing code, tests, etc?

kaihsin commented 3 months ago

There is. Originally some of the API are design for large Tensor. Meaning that if you don't need to support -1 for indexing like python, the C++ side can utilize full 64bit address indexing. the API that using int64 is only to make compatible with Python -1 indexing

manuschneider commented 3 months ago

I see. But perumte does not work with negative indices anyways in the current implementation, at least for BlockUniTensor

yingjerkao commented 3 months ago

There is. Originally some of the API are design for large Tensor. Meaning that if you don't need to support -1 for indexing like python, the C++ side can utilize full 64bit address indexing. the API that using int64 is only to make compatible with Python -1 indexing

Can we do it at Pybind11 level to make C++ code consistent?

IvanaGyro commented 3 months ago

Here is a crop from the Google C++ style guide that we apply. In short, only use unsigned integers when representing a bit pattern and use size_t and ptrdiff_t when appropriate.

Unsigned integers are good for representing bitfields and modular arithmetic. Because of historical accident, the C++ standard also uses unsigned integers to represent the size of containers - many members of the standards body believe this to be a mistake, but it is effectively impossible to fix at this point. The fact that unsigned arithmetic doesn't model the behavior of a simple integer, but is instead defined by the standard to model modular arithmetic (wrapping around on overflow/underflow), means that a significant class of bugs cannot be diagnosed by the compiler. In other cases, the defined behavior impedes optimization.

That said, mixing signedness of integer types is responsible for an equally large class of problems. The best advice we can provide: try to use iterators and containers rather than pointers and sizes, try not to mix signedness, and try to avoid unsigned types (except for representing bitfields or modular arithmetic). Do not use an unsigned type merely to assert that a variable is non-negative.

ianmccul commented 3 months ago

I suggest use int (or some other signed type) for everything. But note that size_t is unsigned, so don't use that! use ssize_t.

C++20 has std::ssize() function for containers, so it is safe to write eg Container<T> v; for (auto i = std::ssize(v)-1; i > 0; --i) { ... }

The MPToolkit is currently C++14 and not ready to use C++20 yet (not enough compiler support), but meanwhile I'm just going to add some ssize() function myself.

In new code, my feeling is always use a signed integer, unless it is actually bit-flipping. Having a container using size_type as a signed type will work fine, and if you mix it with standard containers it will harmlessly convert to an unsigned type. But always use std::ssize() when getting the size of containers when it needs to work with std containers as well, and similarly wrap other existing uses of unsigned types.

Bjarne also advocates for making all .size() functions return a signed int: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1491r0.pdf

IvanaGyro commented 3 weeks ago

related issue: #167

Cytnx-dev / Cytnx

Refer to indices by cytnx_int64 or cytnx_uint64 #444