chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.79k stars 421 forks source link

Can c_ptr sometimes be wide? #12490

Open mppf opened 5 years ago

mppf commented 5 years ago

This is related to #8680 and #11115 but asks a particular design question.

Consider this code in String.chpl:

https://github.com/chapel-lang/chapel/blob/2831d4bb0893e1e281bec2c8972c340f77d2f341/modules/internal/String.chpl#L651-L657

This code has the unfortunate property of copying the entire string in the event that it is not local, in order to access 1 byte.

Why does it do that? The string data type uses c_ptr to implement the buffer. The c_ptr type is not currently ever wide and so the string additionally includes a field indicating the locale owning the c_ptr.

This could be resolved if the string type used _ddata which is effectively a version of c_ptr that can be wide. However there is some desire for the string data type to use user-facing features where possible and that begs the question of if we can create a user-facing form of _ddata?

But why do we have both c_ptr and _ddata? c_ptr was created for C interoperability (and extern blocks). In that setting, we needed something that could represent nested pointer structures. For example, if we pass a c_ptr(c_ptr(c_int)) to a C function, the C function is expecting an int** argument. While the compiler can remove wideness for the outer-most argument to the function, it wouldn't be able to do that for the inner argument.

Here I propose that we make c_ptr be possibly wide. The use of it in the string data type in particular would be wide. When code-generating passing a wide c_ptr to an extern function, the compiler would pass only the address part (and possibly assert that it is local).

But how to address the c_ptr(c_ptr(t)) problem? Instead of making all c_ptr narrow, only make certain c_ptr used in a pattern such as this narrow. I believe we can adjust insertWideReferences to do this.

mppf commented 5 years ago

@benharsh - not urgent but I'm curious about your reaction to this proposal

benharsh commented 5 years ago

I'm still thinking about this, but if we say "c_ptr inside of a c_ptr is narrow", what if those inner pointers are actually remote? Seems like we'd be back in the same situation.

mppf commented 5 years ago

I'm still thinking about this, but if we say "c_ptr inside of a c_ptr is narrow", what if those inner pointers are actually remote? Seems like we'd be back in the same situation.

Well, they wouldn't actually be remote, because we wouldn't allow them to be.

For example, if you had

var ptr: c_ptr(int) = ...;
var ptrPtr:c_ptr(c_ptr(int)) = ...;
ptrPtr[0] = ptr;

Then I'm expecting that in this line:

ptrPtr[0] = ptr;

the compiler would have a possibly wide RHS and a narrow LHS, and so only set the local address portion of the LHS. (It could potentially additionally assert that the RHS is local).

benharsh commented 5 years ago

What if we have an array-of-arrays situation and the user wants to access the data on another locale:

config const n = 10, m = 10;
var c_data : c_ptr(c_ptr(int));
// allocate c_data as n*m array

on Locales[numLocales-1] {
  for (i,j) in zip(1..n, 1..m) do
    c_data[i][j] = i * j;
}

Edit: or are you saying that we would just not support that, either through documentation or a compiler error/warning?

mppf commented 5 years ago

or are you saying that we would just not support that, either through documentation or a compiler error/warning?

Yes. We have Chapel arrays for such things.

I'd like to have something we could use in string that can be wide, and I want it to be user-facing (so _ddata isn't great). We could get c_ptr to do it (which is what I was advocating for here) but the obvious alternative is to add some other thing, like just ptr. (But then I think I'd be annoyed when explaining why c_ptr was always local but ptr isn't, etc). I also don't like c_wide_ptr because it implies it is always wide, but maybe there's a reasonable path there too.

benharsh commented 5 years ago

I guess I lean towards exposing _ddata under another name (not sure about 'ptr'). I feel like users will want that eventually anyways, so having 'ptr' and a possibly-wide c_ptr seems weird.

If we keep c_ptr as always-local, that might allow us to emit better warnings at compile-time.

mppf commented 5 years ago

I guess I lean towards exposing _ddata under another name

Can you suggest any names that we could use?

mppf commented 5 years ago

@benharsh - re. names for a wide c_ptr aka _ddata, we might use

Any other ideas?

I think the main questions I have about it are:

I think the answers to these are "no" and "no", which makes me prefer global_ptr. I think ptr is a little to easy for this low-level thing.

mppf commented 5 years ago

Oh yeah, and then maybe it would be globalPtr for camelCase reasons.

LouisJenkinsCS commented 5 years ago

https://github.com/chapel-lang/chapel/issues/8680

mppf commented 5 years ago

@LouisJenkinsCS - I take you to mean that you think the wide version of c_ptr should be called ptr ?

LouisJenkinsCS commented 5 years ago

I think Ptr should be a generalization of a pointer that is not specific to C pointers or wide pointers. As in, you do not need an explicit _ddata or c_ptr, but just a Ptr which internally can handle locality. It would be a way of saying "I don't care about locality, just that I have a pointer to some data. If it needs to be coerced into a c_ptr and it happens to be remote, then copy the data somewhere locally first. If I access it from another locale, handle performing a remote GET for me. If I need to store into the pointer, perform a remote PUT for me. Etc."

mppf commented 5 years ago

@LouisJenkinsCS - I think you are saying that ptr would be wide if needed and narrow if needed? Rather than always wide or narrow? Or are you just saying that those operations could work (which could be done if ptr were always wide)?

LouisJenkinsCS commented 5 years ago

Or are you just saying that those operations could work (which could be done if ptr were always wide)?

So long as Ptr can be coerced into a narrow c_ptr, then yes this is exactly what I'd want. However I'd like for it to be narrow if the compiler can determine statically that it is never used in a distributed context, or is in a local block.

mppf commented 5 years ago

If it needs to be coerced into a c_ptr and it happens to be remote, then copy the data somewhere locally first.

Did you mean the compiler would do this automatically? Typically C-ish pointers don't include a size, so how could it know how much to copy? Besides that, making copies can cause confusing behavior if the original is modified...

LouisJenkinsCS commented 5 years ago

Then how would you propose the c_ptr to be used for 'C Interoperability'? If its meant to be used from Chapel alone, I.E to directly chpl_comm_put/chpl_comm_get on, then I'm all for it but I think it should be a different kind of pointer, I.E a c_wide_ptr (as you called it), or just ptr.

mppf commented 5 years ago

C code could include header files defining a wide void* and a wide c_ptr could coerce to that type. Even though it could be used for interoperability with C code meant to work with Chapel, I wouldn't expect it to be used with existing C libraries.

I think you are agreeing with @benharsh that instead of making c_ptr sometimes wide, we should introduce a new type like c_wide_ptr or something. We just need to reach some sort of agreement on the name.

As I described in https://github.com/chapel-lang/chapel/issues/12490#issuecomment-480505878 I'm leaning towards calling it a "global pointer". I think ptr might be too short/easy (we want it to be a low-level feature not something considered essential to Chapel programming). I think using c in the name runs into problems with the issue that it doesn't necessarily make sense in C interoperability.