AnyDSL / impala

An imperative and functional programming language
https://anydsl.github.io
GNU General Public License v3.0
153 stars 12 forks source link

Re-interpreting a value of a large type as an array of a smaller type #28

Open ergawy opened 8 years ago

ergawy commented 8 years ago

I want to re-interpret a value as an array of values of a smaller type. For example, an i32 as an array of 4 i8s. Also, the source (big) and target (small) types are polymorphic so the size of the target array is not fixed (in terms of the number of elements rather than the total size in bytes). Can I do that?

leissa commented 8 years ago

No, bitcasting i32 to [i8 * 4] is not allowed. Doing stuff like this is not even allowed in C. For example, the following C code is illegal:

int32_t* p = /*...*/;
int16_t* q = (int16_t*) p;
int16_t i = q[1]; // the C standard does not allow this.

I don't understand the second part of your question. Can you give me some more context or an example?

ergawy commented 8 years ago

EDIT: I just wanted to clarify my intent with a simpler example. Here is a Rust code to do it:

fn main() {
    use std::mem;
    let x : i32 = 200;
    let one: [i8 ; 4] = unsafe { mem::transmute_copy(&x) };

    assert_eq!(-56, one[0]);
    assert_eq!(0, one[1]);
}

ORIGINAL COMMENT: Sorry for not giving an example from the start.

I want to use the PTX warp shuffle instruction (here).

fn shuffle_words[T, T2](input: T, src_lane: i32) -> () {
   let num_words = sizeof[T]() / sizeof[T2]();
   let in_words : [T2 * num_words] = bitcast[[T2 * num_words]](&input); // I know that I can't use an expr for allocation here, but I am just trying to explain my point.
   let out_words : [T2 * num_words];
   for i in range(0, num_words) {
      out_words(i) = shuffle(in_words(i), src_lane);
   }
}

For more context, I am implementing parallel operations equivalent to what is provided by CUDA's CUB library. Here is an implementation of the shuffle operation from CUB's code:

template <typename T>
__device__ __forceinline__ T ShuffleDown(
    T               input,                                  ///< [in] The value to broadcast
    int             src_offset,                             ///< [in] The relative up-offset of the peer to read from
    int             last_lane = CUB_PTX_WARP_THREADS - 1)   ///< [in] Index of first lane in segment
{
    typedef typename UnitWord<T>::ShuffleWord ShuffleWord;

    const int       WORDS           = (sizeof(T) + sizeof(ShuffleWord) - 1) / sizeof(ShuffleWord);

    T               output;
    ShuffleWord     *output_alias   = reinterpret_cast<ShuffleWord *>(&output);
    ShuffleWord     *input_alias    = reinterpret_cast<ShuffleWord *>(&input);

    unsigned int shuffle_word;
    asm volatile("shfl.down.b32 %0, %1, %2, %3;"
        : "=r"(shuffle_word) : "r"((unsigned int) input_alias[0]), "r"(src_offset), "r"(last_lane));
    output_alias[0] = shuffle_word;

    #pragma unroll
    for (int WORD = 1; WORD < WORDS; ++WORD)
    {
        asm volatile("shfl.down.b32 %0, %1, %2, %3;"
            : "=r"(shuffle_word) : "r"((unsigned int) input_alias[WORD]), "r"(src_offset), "r"(last_lane));
        output_alias[WORD] = shuffle_word;
    }

//    ShuffleDown(input_alias, output_alias, src_offset, last_lane, Int2Type<WORDS - 1>());

    return output;
}
slusallek commented 8 years ago

Hi,

We should be careful here, line 3 is certainly legal (indexing an array). The only issue is the pointer_cast, which is also not "illegal" but probably means that the results are undefined (subsequent accesses might give a "bus error" or such due to misalignement).

However, if handled properly (alignment and such) the results can be well defined and are actually really useful. So, it makes sense to think about"How can we design the system to allow this (after enough safeguards)". For example, a pointer cast between 32 bit and 16 bit ints should always be fine on x86, as far as I know, as long as the input pointer is OK.

Best,

Philipp

Am 14.04.2016 um 11:50 schrieb Roland Leißa:

No, bitcasting i32 to [i8 * 4] is not allowed. Doing stuff like this is not even allowed in C. For example, the following C code is illegal:

int32t* p = /..._/; int16_t* q = (int16_t*) p; int16_t i = q[1]; // the C standard does not allow this.

I don't understand the second part of your question. Can you give me some more context or an example?

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/AnyDSL/impala/issues/28#issuecomment-209855939


Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI) GmbH Trippstadter Strasse 122, D-67663 Kaiserslautern

Geschäftsführung: Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender) Dr. Walter Olthoff Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes

Sitz der Gesellschaft: Kaiserslautern (HRB 2313)

USt-Id.Nr.: DE 148646973, Steuernummer: 19/673/0060/3

leissa commented 8 years ago

No, line 3 is illegal (i.e. undefined behavior). According to the C standard (§6.5 - 7):

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

  • a type compatible with the effective type of the object

[...]

And the effective type of q[1] is int32_t. Line 2 is legal as long as the implementation-defined alignment requirements are obeyed.

But I agree that we want to support such things in the long run. I'm just saying, that there are good reasons for C to prohibit such things. Allowing such things will give us performance problems (missed optimization opportunities) at other locations.

@KareemErgawy: I will check out your example at the weekend.

slusallek commented 8 years ago

OK, I see. This is certainly not how C has been used for ages but it seems to be undefined, at last.

Thanks,

Philipp

Am 15.04.2016 um 18:43 schrieb Roland Leißa:

No, line 3 is illegal (i.e. /undefined behavior/). According to the C standard (§6.5 - 7):

An object shall have its stored value accessed only by an lvalue
expression that has one of the following types:

  * a type compatible with the effective type of the object

[...]

And the effective type of |q[1]| is |int32_t|. Line 2 is legal as long as the implementation-defined alignment requirements are obeyed.

But I agree that we want to support such things in the long run. I'm just saying, that there are good reasons for C to prohibit such things. Allowing such things will give us performance problems (missed optimization opportunities) at other locations.

@KareemErgawy https://github.com/KareemErgawy: I will check out your example at the weekend.

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/AnyDSL/impala/issues/28#issuecomment-210541282


Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI) GmbH Trippstadter Strasse 122, D-67663 Kaiserslautern

Geschäftsführung: Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender) Dr. Walter Olthoff Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes

Sitz der Gesellschaft: Kaiserslautern (HRB 2313)

USt-Id.Nr.: DE 148646973, Steuernummer: 19/673/0060/3

leissa commented 8 years ago

Yes, it's a big problem. Many C programmers don't understand the standard and there are many myths regarding the language. And then programmers curse the compiler engineers when the C compiler apparently emits "incorrect" code. But actually the input program was broken to begin with... That's why icc, for instance, has very conservative optimization flags regarding aliasing rules - even with -O3.

But anyway, we're getting off-topic :)

leissa commented 8 years ago

@KareemErgawy: I think the easiest solution would be to simply use Rust's idea with this transmute_copy. Would this be fine for you?

ergawy commented 8 years ago

Yes, that would be totally fine. Also for now, I ended up using bitwise operators as suggested by Arsene. Which do the same trick in a less elegant way.

The down side is this will enable us to only read the words of a large variable and not be able to write them (because it is just a copy not an actual alias). But I understand this goes against the design choice for more type safety.

But again, this is totally fine.

Thanks.