QMCPACK / miniqmc

QMCPACK miniapp: a simplified real space QMC code for algorithm development, performance portability testing, and computer science experiments
Other
26 stars 34 forks source link

Develop should represent concensus and not a particular developers branch. #228 should not have been merged. #231

Closed PDoakORNL closed 5 years ago

PDoakORNL commented 5 years ago

These are divergent and I think this PR should be backed out.

ye-luo commented 5 years ago

Could you explain a bit more? I mean be more specific divergent with what. Why it is an issue.

ye-luo commented 5 years ago

There is no change to the code using these containers. I need these containers for plugging CUDA allocators because address allocated on the device only can not be access on the host.

PDoakORNL commented 5 years ago

Nonspecialized Container should contain no branching code on memory location.

lshulen commented 5 years ago

Your suggestion would be to have a specialization to add the memory type rather than put it in the base class?

PDoakORNL commented 5 years ago

I believe a properly written container would only need the allocator parameter changed. memory space doesn't belong.

lshulen commented 5 years ago

So if you wanted to do something conditional on the memory space you would have to query through to the allocator? I'm not sure I agree if that's the case.

PDoakORNL commented 5 years ago

template specialization based on allocator type.

lshulen commented 5 years ago

Why is that superior to knowing the memory space directly?

PDoakORNL commented 5 years ago

Basic Separation of Concerns

lshulen commented 5 years ago

Can you explain more? I don't understand.

ye-luo commented 5 years ago

I did have a dilemma when I wrote the code this way. I need a way to tell the container not to initialize values if the memory is not host accessible.

If I put a state in the allocator, I can not use std::allocator any more. The other way is through template specialization. However for the CUDA device allocator, basically I need to template every function which writes data to the container. I almost need to maintain a separate set of containers for this allocator. I was hesitating going down this route because we may also get HIP device allocator so on and so forth.

So eventually, I add a template argument in the container.

lshulen commented 5 years ago

I'm generally interested in being able to act differently on an object depending on where the memory lives. The problem I'm having with this idea is that I can imagine a scenario where I might have a several allocators that would work on a given memory space. Maybe one requires things to be aligned, one allocates from a pool of memory and one doesn't care. In that case it seems like I would care more where the memory was than what algorithm was used to allocate it.

ye-luo commented 5 years ago

I believe C++ standard must have encountered this issue and we probably can learn from them. The root cause I believe is that the original C++ abstraction machine has only one memory pool. But they may have a new design to accommodate many memory domains.

PDoakORNL commented 5 years ago

your allocator should handle allocation and deallocation of the container in some way. If its not on the host obviously it needs to be not std::allocator. CUDA doesn't do well with this kind of container class so I don't imagine you are going to instantiate it inside a kernel so this isn't really the place to know about memory space at all.

PDoakORNL commented 5 years ago

Luke that is largely what one would use concepts for. You would check for a concept modeled by the allocator that indicated the "device"

lshulen commented 5 years ago

This is an idiom that I am not completely familiar with. I clearly have homework to do before I can respond intelligently. Can you point me to a reference that would introduce me to things like concepts?

ye-luo commented 5 years ago

These containers should not be used in CUDA kernels. I wrapper CUDA memory allocation and deallocation run-time calls in CUDAallocator and then pass it to the container class. I need to prevent the initialization of values on the host. When I use the data objects, I extract pointers and call cuBLAS.

lshulen commented 5 years ago

@ye-luo why would I prefer to have separate containers depending on the device? I find myself appreciating the Kokkos View sort of model where have only one kind of container, but it can be specialized depending on what I want to do with it.

ye-luo commented 5 years ago

@lshulen I believe it is how much safety you want and how much control developer you need by hand. My kokkos knowledge is limited. I think by default, it assumes UVM or host. If you specialize memory traits as a template argument, it becomes more specific. I think this is a bit more advance than what I did (Using default memory space on host, and do nothing if the memory is not host accessible).

PDoakORNL commented 5 years ago
template<class T, typename Alloc = std::allocator<T>>
class Vector
{
...
Vector(size_t n = 0, T = T());
}

template<class T, typename Alloc>
Vector::Vector(size_t n, T val) : nLocal(n), nAllocated(0), X(nullptr)
{
      resize_impl(n); //this should be fine with 0 as an input
      std::fill_n(X, n, val); //as should this
}

template<class T>
Vector<T,CudaAllocator>::Vector(size_t n, T val) : nLocal(n), nAllocated(0), X(nullptr)
{
      resize_impl(n);
}

more or less

ye-luo commented 5 years ago

@PDoakORNL I understand this is the general way to make the container classes to work. The downside is that for an allocator which gives memory that is not host accessible, you will need to specialize every function that access the memory. It is more than what I need at the moment.

My design logic is not to support all cases but only two. Host accessible and host non-accessible.

  inline Type_t operator()(size_type i) const
  {
    static_assert(MemType == MemorySpace::HOST, "Matrix::operator() MemType must be MemorySpace::HOST");
    return X[i];
  }

With host non-accessible memory, this stops at compilation time if the code try to access memory on the host.

markdewing commented 5 years ago

Would it better to name the parameter for host/non-host accessible more directly? Calling it HOST and CUDA is indirect, and leads to people making more assumptions about what these parameters should mean for the class.

lshulen commented 5 years ago

I think what one is really after is the intended execution space (that is where this memory can be accessed rather than where it lives or what class allocated it). UVM is actually a bit of an annoying edge case if you are specializing depending on the allocator. If this is correct, then being more specific than just HOST or DEVICE (for instance saying host, Cuda, hip etc.) could be useful.

PDoakORNL commented 5 years ago

I don't get why that is a useful design feature at all. If you can't use the class on the device and on the host it doesn't access device memory... What problem is this solving? Also why not just do this

template<class T, typename Alloc = std::allocator<T>>
class Vector;

template<class T>
class Vector<T, std::allocator<T>>
{
everything...
}

template<class T, typename Alloc>
class Vector
{
...
//only non mem access i guess
//I don't think Vector really adds much value but ...
}

Avoiding writing standard assert over and over and making it clear unless you used std::allocator this class is a Vector with a much limited interface

lshulen commented 5 years ago

Looking at it that way, I agree. If you cannot do anything with the class anywhere but on the host than just make sure somehow this squawks when you try to do anything with it on a device. As we think more generally about how to converge the mini app, I'd like to think about a more general container that we could use generically between programming models etc. It was in this vein that my earlier comments were intended.

ye-luo commented 5 years ago

I agree that there is a need of a flexible container. I need a container for the following case.

  1. std::allocator, from standard library, host safe
  2. Mallocator, for aligned allocation, host safe
  3. OMPallocator, for host and device allocation with OpenMP mapping, host safe
  4. CUDAHostAllocator, for pinned host memory, host safe
  5. CUDAManagedAllocator, for CUDA manged memory, host safe
  6. CUDAAllocator, for CUDA device, memory, host unsafe
  7. hipAllocator, for device memory, host unsafe
  8. OpenCLAllocator, for hip device, memory, host unsafe

1-6 is currently in use. 7-8 may be needed in the future. I'd like 1-5 go to implementation A which allows accessing data on the host. 6-7 go to implementation B which excludes all member function accessing data on the host. The goal is to have maximal flexibility and hopefully minimal code.

ye-luo commented 5 years ago

@PDoakORNL your last example needs 5 specialization for A and 1 default for B or it needs 1 default for A and 3 specialization for B. Is it possible to just do two specialization A and B? On the other hand, I noticed that there is some evolution relations static_assert -> SFINAE -> Constraints and concepts. No matter which one is in use, some sort of if statement is needed for each function removed for case B.

PDoakORNL commented 5 years ago

It's possible to do just two specializations. You need to implement a "concept/constraint" that can indicates to the compiler the correct specialization, there are many ways to do that ranging from simple to very complex general and extensible. In this specific case I think you might need three specialization for B but you only have one right now so just go that way.

But I'd admit I don't understand the issue here, it seems like you are prematurely generalizing the container classes. Since its not useful on the device I see no utility in having them manage device memory. So I don't see the issue, it should only have the host methods anyway. If you have generic code that takes a container type and needs something like resize, you'll get a compiler error if your device type doesn't implement resize.

ye-luo commented 5 years ago

"concept/constraint" is in C++20 and I'd better avoid it. I may use SFINAE but I found it is not significantly better than static_assert and the compiler error message is often less clear than static_assert. Between 1+1(3 in the future) specialization and 1 with some if-statement and static_assert, I still feel the latter option is better because I have 3 kinds of containers already.

I need a container class which takes memory allocator as a template argument. The allocator managers allocation and deallocation with RAII. It does manage the memory no mater host and device but it doesn't allow value access on the host if the memory is not accessible on the host. "I see no utility in having them manage device memory." @PDoakORNL I don't understand this comment. All the device memory management are required on the host, so resize is always needed but std::fill_n is excluded on the host if the memory is not host accessible.

PDoakORNL commented 5 years ago

by "concept/constraint" I mean the CS idea not C++20 implementation of it. You can and people have used these ideas for a long time. I pretty much never want to see an if statement for something that could be handled at compile time and produce a function body with no branching. I never want to see an interface with dead functions. A container class models some set of access concepts if it does RAII it also needs an allocator if it models dynamic resizing it needs resize. You do not need a container class on the host for device memory. You need something that models RAII and resize.

ye-luo commented 5 years ago

What is something? I found using a container class is the most straight forward way to achieve RAII and resize. I tried not to maintain a separate set of customized classes.

PDoakORNL commented 5 years ago

By having a specialized class template without the access functions the template classes of 6,7,8 logically doesn't include the dead access functions. I can read this declaration and reason about it. This is logically cleaner than having definitions that have a static assert if you touch them. But still seems needless. You just need an RAII wrapper around an allocator. This would be a type distinct from the container and one could easily reason that accessing it is bad.

PDoakORNL commented 5 years ago

I'd rather maintain a class that does what its interface advertises and what its name implies.

ye-luo commented 5 years ago

When the C++ gets "concept/constraint", will I be able to write one version without doing any specialization?

prckent commented 5 years ago

To make this easier for others to follow, I suggest phoning each other and then posting a summary here of either the questions a wider group should address or the consensus you have reached.