Ada-Rapporteur-Group / User-Community-Input

Ada User Community Input Working Group - Github Mirror Prototype
26 stars 1 forks source link

Add slicing feature for Ada.Containers.Vectors. #75

Open Blady-Com opened 6 months ago

Blady-Com commented 6 months ago

Consider a Unix path of a file /users/me/doc/prog/my_file.ada stored in a vector of strings Full_Path : String_Vector := ["users", "me", "doc", "prog", "my_file.ada"]. Slicing feature could get a selected slice of the vector to get for instance the folder path of my file: Folder_Path : String_Vector := Full_Path.Slice (1, Full_Path.Length - 1);

Ada.Containers.Vectors might define with obvious meaning:

   function Slice (Container : Vector; Low, High : Extended_Index) return Vector;
   procedure Slice (Container : in out Vector; Low, High : Extended_Index);
   function Replace_Slice (Container : Vector; Low, High : Extended_Index; By : Vector) return Vector;
   procedure Replace_Slice (Container : in out Vector; Low, High : Extended_Index; By : Vector);
ARG-Editor commented 2 months ago

AI22-104-1 addresses this feature request.

jere-software commented 2 weeks ago

@ARG-Editor @Blady-Com

I know this already has an AI. Is there any consideration to having options to return an array of elements (same index type, element type) for the Definite version of Vectors?

Most other languages have means to return an array or a reference to an array of elements contained in the vector. It wouldn't apply to indefinite vectors obviously. Users can add them to the existing class, but that requires a lot of separate element copies that could be better optimized by the package implementation itself (using underlying bulk copying).

sttaft commented 1 week ago

This seems like a reasonable suggestion. On the other hand, it is already pretty easy to create an array from a vector using an array aggregate:

[for E of Vec => E]

So the question is whether this is an important enough operation to justify having its own function, which presumably could be implemented a bit more efficiently.

ARG-Editor commented 1 week ago

Tucker wrote:

... which presumably could be implemented a bit more efficiently.

But not necessarily, depending upon the underlying implementation of containers. For Janus/Ada, I'd guess it would be less efficient, because an array of a generic formal private type is more complex than a directly implemented array (a consequence of generic sharing), while the locally declared array and aggregate would know all of the types involved and thus directly generate code.

I also note that providing such an operation (or any operations not available in the indefinite versions) would provide an impediment to moving to the indefinite version when that becomes necessary.

The containers were never intended to provide every possible useful operation; such a library would have so many operations that it would be hard to find the ones needed.

Finally, this is essentially Github Issue #77, which was discussed by the ARG and rejected during our February 2024 meeting. (See that issue and the associated minutes for more details.) Our procedures do not allow reconsideration of settled topics in the absence of new information (otherwise, someone could essentially force additions to the Standard by simply reintroducing it repeatedly until people just vote it in to get rid of the annoyance). I don't think that there is any new information in this request. (The request would have been better posted somewhere else rather than in this unrelated topic.)

jere-software commented 1 week ago

@ARG-Editor I don't really feel like asking for the slice operations to include versions with array results to be unrelated to the issue of adding slice operations, so I think that is a bit unfair. My question was asked in the context of this issue. I was only focused on the slice operations under discussion, not the general scope of vectors.

I didn't realize y'all had discussions on array results in the general case though so I apologize that I missed that. I wasn't intending to bring up old business already hashed out.

I do disagree with your take on efficiency though. It is more often the case than not that bulk copying data is faster than copying element by element via procedure call. Most compilers heavily optimize their bulk copy operations much more efficiently. With the current implementation of Vectors to roll your own slice operation with array outputs you have to go element by element with function calls. That will very often be less efficient.

I understand that you don't want to add that functionality now though, so I'll leave it be. My apologies for asking.

This seems like a reasonable suggestion. On the other hand, it is already pretty easy to create an array from a vector using an array aggregate:

[for E of Vec => E]

So the question is whether this is an important enough operation to justify having its own function, which presumably could be implemented a bit more efficiently.

@sttaft My apologies, I was more focused on getting data out of the vector as arrays using slicing. I see that the ARG already decided against it based on the comments above, so my apologies for bringing up the topic.

I do appreciate your thoughts provided though, so thank you for that!

ARG-Editor commented 1 week ago

Jere writes:

@ARG-Editor https://github.com/ARG-Editor I don't really feel like asking for the slice operations to include versions with array results to be unrelated to the issue of adding slices, so I think that is a bit unfair. My question was asked in the context of this issue. I was only focused on the slice operations under discussion, not the general case of vectors.

I thought it would be bizarre to have slice operations returning arrays without having operations to do that for the entire object. And if you have operations for returning the entire vector object as an array, you don't need slice versions (since we now have slice operations returning vectors). But I admit I didn't read closely enough (I assumed you were asking for the entire vector operations again).

I didn't realize y'all had discussions on array results in the general case though so I apologize that I missed that. I wasn't intending to bring up old business already hashed out.

No problem. There is a lot of stuff, and unless you follow the meeting minutes closely (and even if you do :-), it's early to forget stuff.

I do disagree with your take on efficiency though. It is more often the case than not that bulk copying data is faster than copying element by element via procedure call. Most compilers heavily optimize their bulk copy operations (like memcpy for example) so that data is put into cache more efficiently.

All true, but you missed my point (which was my objection to Tucker's statement): in this context (a generic body with a generic private type element), it's pretty unlikely that any bulk copy operation could be used.

With the current implementation of Vectors to roll your own slice operation with array outputs you have to go element by element with function calls. That will very often be less efficient.

For Janus/Ada's implementation of generics, that's inevitable. Components of generic private types are allocated indirectly, and copying is done by a thunk call (that is, a subprogram passed into the generic). There's no possibility of that being done in any bulk manner. Indeed, code outside of the generic has a far better chance of being optimized into some form of bulk copy (because neither of those things is necessary). The aggregate could end up being many times faster, even if it is only retrieving one element at a time, just because of all of the overhead that is avoided.

Other implementations of course could vary. But I think it would be pretty hard to use a block copy in such an operation, since the array types and bounds are usually different. It would be easiest to just write a loop copying element-by-element, and whether that is optimizable depends on many things. And given that one could inline the element retrieval operation, it's quite likely that the code wouldn't vary much from that of the aggregate.

In general, I'm always suspicious of efficiency arguments (especially for operations in generic packages), because implementations vary so much. It's possible in theory to reduce almost everything to the optimal code regardless of what the programmer actually wrote. So what really matters is how much of those possible optimizations a compiler does, and that probably matters far more than how the operations in the programming language are defined. (Efficiency arguments that matter are mostly those about things that potentially prevent optimizations.)

I understand that you don't want to add that functionality now though, so I'll leave it be. My apologies for asking.

I'm not the only one that gets to decide, although this particular topic was already hashed over once. I think it would need some rather compelling arguments to come up again.

Personally, I think of Vectors as a replacement for arrays (which is why we have been trying to ensure that all of the sane capabilities of arrays are present). If you are willing to use the containers in the first place, you should stay in them as much as possible (why bring back all of the memory management problems that the containers handle for you? And conversions between types always cost something, do as few as possible). So outside of interfacing, I don't see much reason to use an array with a Vector. And for interfacing, you don't want some random type declared in the Vectors package, you need a type tailored to your use (with specified Component_Size and/or Convention so it matches the definition of whatever you are interfacing with). You'd end up not using the vector <-> array operations anyway, or using an expensive type conversion that might have to adjust the element sizes. Not saving anything that way.

I'd have more sympathy for having a generic array type parameter for such operations, but of course you don't want that on the usual Vectors container package (you would not want to make users define a type that they rarely care about, and it would be wildly incompatible now). I suppose you could define a child generic to do Vector <-> Array operations, but those are weird -- whether people would use those (or get them to work when they tried) is questionable.

                   Randy.