AdaCore / ada-spark-rfcs

Platform to submit RFCs for the Ada & SPARK languages
63 stars 28 forks source link

[RFC] Storage model #67

Closed QuentinOchem closed 3 years ago

QuentinOchem commented 3 years ago

Link to full text: https://github.com/QuentinOchem/ada-spark-rfcs/blob/storage_model/considered/rfc-storage_model.rst

briot commented 3 years ago

I am a bit confused by the introduction here:

The current Storage Pools [...], rely heavily on 
the usage of object orientation and controlled objects. [...] (e.g. dispatching calls involve an
indirection that is slower and can't be inlined) and run-time footprint (
controlled types and finalization relies on full-runtime capabilities on e.g.
GNAT).

A storage pool is indeed extending a Limited_Controlled type, and there is no real reason for that. In fact, I believe we could simply remove that "is new Limited_Controlled" from the type definition altogether to remove the need for runtime support. This would be backward incompatible, but the compiler will help diagnose the issue here. Users who want it can use a field that would be a controlled type itself. There is no performance penalty to speak of here, because that's only two subprogram calls for the whole duration of the pool, which is negligible.

Is it a fact that calls to Allocate and Deallocate are dispatching calls ? Given a type, we know statically its storage pool object, so it seems the compiler should know statically which actual Allocate and Deallocate are called, and therefore no performance penalty. I did not measure. My worry here is that some of the restrictions you mention for formal generic parameters are basically emulating the rules we already have for tagged types anyway, so it would be better if the use of aspects is not simply a way to work around compiler limitations.

Perhaps this is related to the discussion you have later in the document related to generics ? In this case, if the pool is passed as a formal parameter, the compiler might not always know statically the actual pool to use (though again it seems that in most actual use cases, the compiler could know it) ?

One limitation of the pools as we currently have them is the inability to control what "dereference" means (so it makes it harder to use a different Address_Type, to reuse the name from your proposal). Is it the intent that Copy_In could be used for that purpose ? Shouldn't it receive a Stream_Element_Array, rather than a System.Address, since the latter is a concept that might not always apply ?

I think it would be nice to have a list of examples of possibly usage for storage pools. When we were thinking of some to add to GNATCOLL, there was only a limited number of use cases that came to mind, but among all the users out there we are bound to find neat ideas, and it would be nice if the new proposal made some of these use cases easier to implement.

The document is dense, so it is likely that some of these concerns are addressed there, and I simply missed them. Sorry when this is the case.

QuentinOchem commented 3 years ago

@briot

Is it a fact that calls to Allocate and Deallocate are dispatching calls ? Given a type, we know statically its storage pool object, so it seems the compiler should know statically which actual Allocate and Deallocate are called, and therefore no performance penalty.

That's an interesting comment. What the RM actually say in 13.11:13:

S'Storage_Pool

Denotes the storage pool of the type of S. The type of this attribute is Root_Storage_Pool'Class.

Note that consequently Ada allows you to write something like that:

   A_Pool : System.Storage_Pools.Root_Storage_Pool'Class := <something>;

   type X is access all Integer
     with Storage_Pool => A_Pool;

This is arguably convoluted but perfectly legal. Granted however, there could be compiler-specific optimizations done when the actual value provided for the Storage_Pools aspect if of a definite type - I do not know for a fact whether it's not or not.

I would however agree that - if that were the only problem addressed here - the proposal would be an overshot. There are other elements of the proposal that are more consistent towards the removal of the tagged type altogether - the fact that we want to allow custom Address_Type and the fact that we want to generate sensible defaults when this Address_Type ends up being System.Address for example.

A storage pool is indeed extending a Limited_Controlled type, and there is no real reason for that. In fact, I believe we could simply remove that "is new Limited_Controlled" from the type definition altogether to remove the need for runtime support.

The issue is that if you do that, it becomes convoluted to add finalization after the fact. In other words, we probably want to allow people to use finalization should they want to in a relatively intuitive way - without forcing them if not.

Perhaps this is related to the discussion you have later in the document related to generics ? In this case, if the pool is passed as a formal parameter, the compiler might not always know statically the actual pool to use (though again it seems that in most actual use cases, the compiler could know it) ?

Not really - the generic section is really describing consequences to the model as opposed to the original rationale.

One limitation of the pools as we currently have them is the inability to control what "dereference" means (so it makes it harder to use a different Address_Type, to reuse the name from your proposal). Is it the intent that Copy_In could be used for that purpose ? Shouldn't it receive a Stream_Element_Array, rather than a System.Address, since the latter is a concept that might not always apply ?

In this proposal, you always either copy from an Address_Type to a System.Address, or vice-versa. There's no provision for directly copying between two non-native memory models - so System.Address should always apply.

We could however consider an (aliased) Stream_Element_Array.

In any case, I'm not entirely sure if that addresses your problem of dereference. What Copy_In and Copy_Out to is to describe how to copy in and out the memory model. So that if you have something like:

   Y := X.all;

and Y and X are of two different model, then Copy_Out will be called when accessing memory to copy to Y.

I think it would be nice to have a list of examples of possibly usage for storage pools. When we were thinking of some to add to GNATCOLL, there was only a limited number of use cases that came to mind, but among all the users out there we are bound to find neat ideas, and it would be nice if the new proposal made some of these use cases easier to implement.

I'm not familiar with these examples, but would be interested to see if you had something in mind.

The document is dense, so it is likely that some of these concerns are addressed there, and I simply missed them. Sorry when this is the case.

That doesn't sounds like it but even if that were the case, no worries :-)

sttaft commented 3 years ago

The Storage_Model aspect seems to be associated with the data type rather than the pointer type. The Storage_Pool aspect was associated with the pointer type. It seems less flexible to associate the Storage_Model with the data type. Can you explain why you chose that approach? You also seem to be implying that 'Address would generate something different. I would presume instead we want 'Access to generate a different representation when creating an access value for something that might reside in device memory. Kind of a "super-fat" pointer. I guess what I am missing is the overall model for this aspect, and how it relates to the use of an allocator or a call on an instance of Unchecked_Deallocation.

QuentinOchem commented 3 years ago

The Storage_Model aspect seems to be associated with the data type rather than the pointer type. The Storage_Pool aspect was associated with the pointer type. It seems less flexible to associate the Storage_Model with the data type. Can you explain why you chose that approach?

I don't think there's a flexibility issue here - or at least I haven't identified yet a scenario where you could do something with Storage_Pools and not with Storage_Model.

Note that the aspect is on a subtype - not a type. The main advantage of this is to be able to avoid using allocation / deallocation when what you really want is stack-like behavior. E.g.:

   declare
      type Native_Side_Array is array (Integer range <>) of Integer;
      subtype CUDA_Side_Array is Native_Side_Array with Storage_Model => CUDA_Storage_Model;

      N : Native_Side_Array (1 .. 10_000);
      C : Cuda_Side_Array (1 .. 10_000); -- allocates C
   begin
      C := N; -- instrumented copy;
   end; -- C will be deallocated

Another aspect is the definition of where to do copies from both locations. If we were to go with an access-based semantic, the previous example would read:

   declare
      type Native_Side_Array is array (Integer range <>) of Integer;
      type CUDA_Side_Array_Acc is access Native_Side_Array with Storage_Model => CUDA_Storage_Model;
      function Deallocate is new Unchecked_Deallocation (Native_Side_Array, CUDA_Side_Array);

      N : Native_Side_Array (1 .. 10_000);
      C : CUDA_Side_Array_Acc := new Native_Side_Array (1 .. 10_000);
   begin
      C.all := N; -- instrumented copy;
      Deallocate (C);
   end;

I'm not entirely clear on how easy it is to identify that "C.all" really points to memory that is allocated on a foreign memory model - C.all is of type Native_Side_Array so here that's an assignment between two objects of the same types. Using subtypes instead, it it statically known that the two subtypes are different and we can instrument the copy (just as we instrument dynamic checks when these subtypes are of different ranges).

You also seem to be implying that 'Address would generate something different. I would presume instead we want 'Access to generate a different representation when creating an access value for something that might reside in device memory. Kind of a "super-fat" pointer. I guess what I am missing is the overall model for this aspect, and how it relates to the use of an allocator or a call on an instance of Unchecked_Deallocation.

If you look at the model definition, it relies on a address type potentially custom. So object of a specific Storage_Model are associated to an instance of this specific address type, which is returned by 'Address indeed.

The model provides an allocation returning such custom address type, as well as a deallocation that takes this type as input. These are the ones to be used by "new" and "Unchecked_Deallocation".

sttaft commented 3 years ago

You talk about subtypes being statically distinct, but in general two subtypes are distinguished from one another only by their having different constraints.

QuentinOchem commented 3 years ago

You talk about subtypes being statically distinct, but in general two subtypes are distinguished from one another only by their having different constraints.

Indeed - and that proposal is about allowing this distinction to include both constraints and storage model.

sttaft commented 3 years ago

If you want this aspect to be subtype-specific, and to represent an aspect that allows subtypes to be statically distinguished, you should emphasize that in your definition. You also need to decide what are the rules for subtype conversion, such as at the point of parameter passing. Presumably these types are normally passed by reference, but that could be a problem if you allow subtype conversion implicitly on parameter passing. Using distinct types for such things are generally safer and simpler, but that would make it harder to map the existing Storage_Pool mechanism to this new approach.

QuentinOchem commented 3 years ago

@sttaft thanks for the feedback. Making the subtype-specific aspect of the proposal more clear will definitely be an improvement - I agree that this is a key aspect which description is a bit diffuse. With regards to the parameter passing issue, we indeed identified this problem and are suggesting in this proposal not to allow parameter passing if the subtypes are of a different storage model, precisely as to cater for the by-reference cases you also identified.

QuentinOchem commented 3 years ago

(deciding the rules for subtype conversion is a important point to be addressed though and indeed isn't clearly addressed in the current proposal).

sttaft commented 3 years ago

On Wed, Jan 6, 2021 at 9:59 AM QuentinOchem notifications@github.com wrote:

@sttaft https://github.com/sttaft thanks for the feedback. Making the subtype-specific aspect of the proposal more clear will definitely be an improvement - I agree that this is a key aspect which description is a bit diffuse. With regards to the parameter passing issue, we indeed identified this problem and are suggesting in this proposal not to allow parameter passing if the subtypes are of a different storage model, precisely as to cater for the by-reference cases you also identified.

As you already point out, this can create generic contract model challenges as well, since you will have to know things about the actual subtype when you start using a formal type. What are the implications if you don't specify the Storage_Model of a formal type? You will need to define matching rules, and define the best case relevant in the spec of a generic instance, and worst case in the body of an instance. In general, putting too much compile-time "baggage" on subtypes is a real challenge in Ada. They are generally distinguished by things that can be determined at run-time, with relatively few compile-time rules tied to a particular subtype, and any compiler-time rules need to interact with static matching of subtypes, which adds complexity.

Associating special properties with an access type avoids several of these problems. You mentioned the issue of the compiler losing track of an access type property upon an "X.all" dereference, but this is in some ways simpler than having to track subtype-specific properties through expressions. Dereference is a single construct, where a number of special things already happen both at compile-time (e.g. accessibility level tracking) and at run-time (e.g. null pointer checks). Subtypes occur everywhere, and are usually of relatively little consequence as far as compile-time rules.

An alternative direction is to look at the rules associated with "shared passive" partitions in annex E. Objects in shared passive partitions essentially live in a special address space, which outlives most other scopes, and might be a useful model for the GPU address space.

Yet another alternative direction is to consider a GPU like a special kind of task that makes no up-level references outside its task body (or at least any up-level references are recognized as being very slow), and generally uses rendezvous for communicating with the CPU. A model like this might clarify both data and control flow across the CPU/GPU boundary.

In any case, I think your proposal would be enhanced by including some longer worked-out examples of use illustrating the interesting use cases involving GPUs, and an indication of the benefits the Storage_Model approach provides. At the moment the examples show various ways memory is moved back and forth, but a larger framework has not been provided to understand when and where such memory movement would be useful. Having such longer, worked-out examples would also make it easier to compare the Storage_Model approach with other approaches. It also seems important to understand the viewpoint from the GPU as well as from the CPU, and what the logic would typically be on each side.

-Tuck

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/AdaCore/ada-spark-rfcs/pull/67#issuecomment-755348442, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANZ4FJ4X6KDA7VZMSO75O3SYR3GZANCNFSM4VEIUQQQ .

QuentinOchem commented 3 years ago

As you already point out, this can create generic contract model challenges as well, since you will have to know things about the actual subtype when you start using a formal type. What are the implications if you don't specify the Storage_Model of a formal type? You will need to define matching rules, and define the best case relevant in the spec of a generic instance, and worst case in the body of an instance.

Agreed.

In general, putting too much compile-time "baggage" on subtypes is a real challenge in Ada. They are generally distinguished by things that can be determined at run-time, with relatively few compile-time rules tied to a particular subtype, and any compiler-time rules need to interact with static matching of subtypes, which adds complexity.

Clearly the proposal is to go extend on to the current concept of subtypes. They seem, at least from the current stage of the proposal, like a promising alternative to forcing the use of access types.

Associating special properties with an access type avoids several of these problems. You mentioned the issue of the compiler losing track of an access type property upon an "X.all" dereference, but this is in some ways simpler than having to track subtype-specific properties through expressions. Dereference is a single construct, where a number of special things already happen both at compile-time (e.g. accessibility level tracking) and at run-time (e.g. null pointer checks).

That's good to hear. This however doesn't address another potential benefit of the proposal which is about to get rid of these access types altogether for most cases.

I would however agree that there are two workable possibilities at this stage of the investigation - either using a subtype or using an access type to carry a models.

Subtypes occur everywhere, and are usually of relatively little consequence as far as compile-time rules. An alternative direction is to look at the rules associated with "shared passive" partitions in annex E. Objects in shared passive partitions essentially live in a special address space, which outlives most other scopes, and might be a useful model for the GPU address space.

They might - but they only cover some of the needs. Many GPU applications locally allocate data. Given the fact that they don't cover the full spectrum of need, I would live that on the side for now and only come back to it if there's a strong feedback on this feature.

Yet another alternative direction is to consider a GPU like a special kind of task that makes no up-level references outside its task body (or at least any up-level references are recognized as being very slow), and generally uses rendezvous for communicating with the CPU. A model like this might clarify both data and control flow across the CPU/GPU boundary. In any case, I think your proposal would be enhanced by including some longer worked-out examples of use illustrating the interesting use cases involving GPUs, and an indication of the benefits the Storage_Model approach provides. At the moment the examples show various ways memory is moved back and forth, but a larger framework has not been provided to understand when and where such memory movement would be useful. Having such longer, worked-out examples would also make it easier to compare the Storage_Model approach with other approaches. It also seems important to understand the viewpoint from the GPU as well as from the CPU, and what the logic would typically be on each side.

You can essentially take any pre-existing CUDA application to have an idea on how that might work. To be clear, as far as this pertains to GPU-related needs, this proposal is not about proposing a completely new flow for developing GPU applications, but rather to interact with those that developers are already accustomed to (applies to CUDA, would apply equally to something like OpenCL) and offer some specific Ada-awareness. Proposing a whole new model, e.g. using task and rendez-vous, is outside of the scope and would require much further analysis, akin to adding general purpose parallel constructions.

This is also supposed to be more generic than the specific GPU issues (which was the main issue with #51) which is why it is specifically built to work with storage pools and solve issues related to subpools.

sttaft commented 3 years ago

On Wed, Jan 6, 2021 at 1:27 PM QuentinOchem notifications@github.com wrote:

...In any case, I think your proposal would be enhanced by including some longer worked-out examples of use illustrating the interesting use cases involving GPUs, and an indication of the benefits the Storage_Model approach provides. At the moment the examples show various ways memory is moved back and forth, but a larger framework has not been provided to understand when and where such memory movement would be useful. Having such longer, worked-out examples would also make it easier to compare the Storage_Model approach with other approaches. It also seems important to understand the viewpoint from the GPU as well as from the CPU, and what the logic would typically be on each side.

You can essentially take any pre-existing CUDA application to have an idea on how that might work. To be clear, as far as this pertains to GPU-related needs, this proposal is not about proposing a completely new flow for developing GPU applications, but rather to interact with those that developers are already accustomed to (applies to CUDA, would apply equally to something like OpenCL) and offer some specific Ada-awareness. Proposing a whole new model, e.g. using task and rendez-vous, is outside of the scope and would require much further analysis, akin to adding general purpose parallel constructions.

If you would like programmers who are not familiar with CUDA to evaluate this proposal, then I think it would help to provide some more complete examples, at least at a pseudo-code level. And I would guess that even folks familiar with CUDA would benefit from seeing how this proposal fits into an example they might already understand.

-Tuck