StanfordLegion / legion

The Legion Parallel Programming System
https://legion.stanford.edu
Apache License 2.0
682 stars 145 forks source link

Regent: Slow "inbuilt" HDF5 performance #984

Closed LonelyCat124 closed 3 years ago

LonelyCat124 commented 3 years ago

I'm not sure if this is simply due to me creating/deleting regions and this being overly expensive, so if so apologies, but I don't believe it does.

TLDR: Using attach/acquire/release/detach is much slower than using HDF5 directly.

I have 2 implementations of using HDF5 through regent (with some metaprogramming). The first I created does file creation using HDF5 calls, and then:

--The copy between the hdf5 region and the particle array needs to be done in a task to work. This is created here
  local task copy_task( hdfreg : region(ispace(int1d), hdf5_io_space), particle_array : region(ispace(int1d), part) ) where
    reads(particle_array), writes(hdfreg) do
    --Loop over the elements of the regions
    var num_written = 0
    for i in particle_array.ispace do
      --Copy every element from the particle_array to the corresponding field in the io_region.
       var valid = true
      [neighbour_search_validity:map( function(element)
        local field_path = string_to_field_path.get_field_path(element.field)
        return rquote
          valid = valid and (particle_array[i].[field_path] == [element.result]);
        end
       end)];
      if valid then
        [mapper_fields:map(function(field)
             return rquote
             hdfreg[int1d(num_written)].[field.io_field] = particle_array[i].[field.part_field]
             end
        end)];
        num_written = num_written + 1
      end
    end

  local create_ code = rquote
  --CREATE HDF5 code / setup file code
      --Once the file is created, we create a region to attach the file to, and launch the copy_task to write to the file.
    var hdfreg = region(ispace(int1d, num_parts), hdf5_io_space)
    attach(hdf5, hdfreg.[io_fields], filename, regentlib.file_read_write)
    acquire(hdfreg)
    copy_task(hdfreg, particle_array)
    release(hdfreg)
    detach(hdf5, hdfreg)
    __delete(hdfreg)
  end

The hdf5_io_space field space is created in the code and only contains the required output fields.

The second implementation creates the same files, but I'm using HDF5 calls directly (with metaprogramming), and creating/freeing the arrays to handle the data copies.

This is then built into a simple test code that does very simple Smoothed Particle Hydrodynamics testcase. Files are output approximately every 0.001 sim time (~4-5s real time with no file I/O, every 17 timesteps).

The manually written HDF5 code matches this, with outputs taking place every 4-5s:

Time = 0.001057, runtime = 4, step_count = 17
Time = 0.002054, runtime = 8, step_count = 34
Time = 0.003052, runtime = 12, step_count = 51
Time = 0.004050, runtime = 16, step_count = 68
Time = 0.005048, runtime = 21, step_count = 85
Time = 0.006046, runtime = 26, step_count = 102
Time = 0.007044, runtime = 31, step_count = 119

The Regent library HDF5 code (with the new region creation/deletion) takes significantly longer, and notably takes longer and longer over time it seems:

Time = 0.001057, runtime = 15, step_count = 17
Time = 0.002054, runtime = 46, step_count = 34
Time = 0.003052, runtime = 93, step_count = 51
Time = 0.004050, runtime = 165, step_count = 68
Time = 0.005048, runtime = 254, step_count = 85

Notably I also tried creating this region initially and reusing it (instead of keeping creating it), so I just had:

    attach(hdf5, io_array.[io_fields], filename, regentlib.file_read_write)
    acquire(io_array)
    copy_task(io_array, particle_array)
    release(io_array)
    detach(hdf5, io_array)

without the region creation/deletion and still had this timing profile:

Time = 0.001057, runtime = 13, step_count = 17
Time = 0.002054, runtime = 44, step_count = 34
Time = 0.003052, runtime = 92, step_count = 51
Time = 0.004050, runtime = 155, step_count = 68
Time = 0.005048, runtime = 247, step_count = 85

so the region creation/deletion isn't the main cause of the slowdown. Anything I can check to see what is causing this?

lightsighter commented 3 years ago

Can you capture Legion Prof profiles running both ways?

LonelyCat124 commented 3 years ago

Profiles both ways are available here: https://drive.google.com/drive/folders/1tU7x8M21HkPngbUGiLUQ4V3OSuRMgCJL?usp=sharing (~50MB each).

inbuild_prof.gz is the one using attach etc. while alt_prof.gz uses HDF5 code directly.

lightsighter commented 3 years ago

I don't see anything unusual in these profiles from a Legion perspective. The "inbuild" profile looks completely limited by the application CPU processor. The Legion utility processor is not busy at all and neither is the Realm DMA channel which would be moving the HDF5 data. It looks to me like main is busy for very long stretches of time, often 50-100ms where it is not asking Legion to do any additional work. Whatever it's doing, it's running your application CPU code so you should be able to track down where the time is going. It's somewhere in your application-side code.

LonelyCat124 commented 3 years ago

Ok - I'll rerun a profile with a standard profiler to see if I can track down what is happening - the two codes are identical outside the code that handles File I/O - i.e. HDF5 calls vs region/attach/release/etc.

LonelyCat124 commented 3 years ago

Ok - I ran both versions through vtune hotspots collection on HPC machine (1 thread for util & cpu), to ensure no other noise could be interfering.

The manual version hotspots were:

pthread_cond_timedwait
                                                                                                           libpthread.so.0    4.569s
std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int>>::_M_erase
                                                                                                           libregent.so       4.289s
Legion::Internal::PhysicalRegionImpl::get_instance_info
                                                                                                           libregent.so       2.983s
posix_memalign
                                                                                                           libc.so.6          2.821s
std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int>>::_M_copy<std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int>>::_Alloc_node>  libregent.so       2.590s

so no big hotspots anywhere.

With the inbuilt HDF5 implementation I get:

std::_Rb_tree_increment                                                                 libstdc++.so.6   64.973s
std::vector<long long, std::allocator<long long>>::_M_realloc_insert<long long const&>  libregent.so     58.221s
Legion::Internal::TaskContext::check_region_dependence                                  libregent.so     56.902s
_int_free                                                                               libc.so.6        48.841s
Legion::PhysicalRegion::is_mapped                                                       libregent.so     35.146s

I think the next 2 hotspots might be the creation and deletion of those regions again (Legion::PhysicalRegion::~PhysicalRegion and Legion::PhysicalRegion::PhysicalRegion). I'm going to retry modifying the code to reuse the region (over creation and deletion) and see what that profile looks like.

Update: I removed region creation/deletion to just an initial "io" region which uses attach/acquire etc. to do file IO, and I get a similar hotspot to creating regions each time:

std::_Rb_tree_increment                                                                 libstdc++.so.6   66.887s
std::vector<long long, std::allocator<long long>>::_M_realloc_insert<long long const&>  libregent.so     58.083s
Legion::Internal::TaskContext::check_region_dependence                                  libregent.so     56.868s
_int_free                                                                               libc.so.6        47.759s
Legion::PhysicalRegion::is_mapped                                                       libregent.so     36.460s

This seems at odds with what you see from legion_prof - anything else I could look at to try to see what is causing this?

lightsighter commented 3 years ago

We need to see backtraces for those hot spots. Just seeing the lowest level function call is not helpful. Vtunes will know what the backtraces are but you'll probably need to dig around for them.

LonelyCat124 commented 3 years ago

Backtrace for the top hotspot I think is (no line numbers as I'm not sure how to compile Regent with debug information and still optimised for performance - maybe the LEGION_BACKTRACE=1 would work but I'm not sure if that needs debug mode on)

std::_Rb_tree_increment,,66.886980,libstdc++.so.6,std::_Rb_tree_increment(std::_Rb_tree_node_base*),tree.cc,0xbbc30
Legion::Internal::TaskContext::check_region_dependence,63.927986,libregent.so,"Legion::Internal::TaskContext::check_region_dependence(unsigned int, Legion::IndexSpace, Legion::RegionRequirement const&, Legion::Internal::RegionUsage const&, Legion::RegionRequirement const&, bool) const",[Unknown],0x5a2fa0
Legion::Internal::TaskContext::find_conflicting_regions,0,libregent.so,"Legion::Internal::TaskContext::find_conflicting_regions(Legion::Internal::TaskOp*, std::vector<Legion::PhysicalRegion, std::allocator<Legion::PhysicalRegion>>&)",[Unknown],0x5a4db0
Legion::Internal::InnerContext::execute_task_launch,0,libregent.so,"Legion::Internal::InnerContext::execute_task_launch(Legion::Internal::TaskOp*, bool, Legion::Internal::LegionTrace*, bool, bool)",[Unknown],0x5adee0
Legion::Internal::InnerContext::execute_task,0,libregent.so,"Legion::Internal::InnerContext::execute_task(Legion::TaskLauncher const&, std::vector<Legion::OutputRequirement, std::allocator<Legion::OutputRequirement>>*)",[Unknown],0x5ae180
Legion::Internal::Runtime::execute_task,0,libregent.so,"Legion::Internal::Runtime::execute_task(Legion::Internal::TaskContext*, Legion::TaskLauncher const&, std::vector<Legion::OutputRequirement, std::allocator<Legion::OutputRequirement>>*)",[Unknown],0x7f91a0
Legion::Runtime::execute_task,0,libregent.so,"Legion::Runtime::execute_task(Legion::Internal::TaskContext*, Legion::TaskLauncher const&, std::vector<Legion::OutputRequirement, std::allocator<Legion::OutputRequirement>>*)",[Unknown],0x48fa20
legion_task_launcher_execute,0,libregent.so,legion_task_launcher_execute,[Unknown],0x4a8440
$<main>,0,test_wcsph2.exe,$<main>,[Unknown],0x422eb0
$__regent_task_10015_primary,0,test_wcsph2.exe,$__regent_task_10015_primary,[Unknown],0x422e50
Realm::LocalTaskProcessor::execute_task,0,libregent.so,"Realm::LocalTaskProcessor::execute_task(unsigned int, Realm::ByteArrayRef const&)",[Unknown],0xec2ca0
Realm::Task::execute_on_processor,0,libregent.so,Realm::Task::execute_on_processor(Realm::Processor),[Unknown],0xd90870
Realm::UserThreadTaskScheduler::execute_task,0,libregent.so,Realm::UserThreadTaskScheduler::execute_task(Realm::Task*),[Unknown],0xd90b60
Realm::ThreadedTaskScheduler::scheduler_loop,0,libregent.so,Realm::ThreadedTaskScheduler::scheduler_loop(void),[Unknown],0xd92e80
Realm::UserThread::uthread_entry,0,libregent.so,Realm::UserThread::uthread_entry(void),[Unknown],0xd78bd0

This looks like it matches up with how legion prof says its in the main task - I'm not sure what its doing though as the only difference is the code that attaches/acquires/launches the copy task.

Other hotspots that I found:

"std::vector<long long, std::allocator<long long>>::_M_realloc_insert<long long const&>",,58.082531,libregent.so,"void std::vector<long long, std::allocator<long long>>::_M_realloc_insert<long long const&>(__gnu_cxx::__normal_iterator<long long*, std::vector<long long, std::allocator<long long>>>, long long const&&&)",[Unknown],0x7bfc10
,Legion::Internal::RegionTreeForest::compute_index_path,57.972530,libregent.so,"Legion::Internal::RegionTreeForest::compute_index_path(Legion::IndexSpace, Legion::IndexSpace, std::vector<long long, std::allocator<long long>>&)",[Unknown],0x781d10
,Legion::Internal::TaskContext::check_region_dependence,0,libregent.so,"Legion::Internal::TaskContext::check_region_dependence(unsigned int, Legion::IndexSpace, Legion::RegionRequirement const&, Legion::Internal::RegionUsage const&, Legion::RegionRequirement const&, bool) const",[Unknown],0x5a2fa0
,Legion::Internal::TaskContext::find_conflicting_regions,0,libregent.so,"Legion::Internal::TaskContext::find_conflicting_regions(Legion::Internal::TaskOp*, std::vector<Legion::PhysicalRegion, std::allocator<Legion::PhysicalRegion>>&)",[Unknown],0x5a4db0
,Legion::Internal::InnerContext::execute_task_launch,0,libregent.so,"Legion::Internal::InnerContext::execute_task_launch(Legion::Internal::TaskOp*, bool, Legion::Internal::LegionTrace*, bool, bool)",[Unknown],0x5adee0
,Legion::Internal::InnerContext::execute_task,0,libregent.so,"Legion::Internal::InnerContext::execute_task(Legion::TaskLauncher const&, std::vector<Legion::OutputRequirement, std::allocator<Legion::OutputRequirement>>*)",[Unknown],0x5ae180
,Legion::Internal::Runtime::execute_task,0,libregent.so,"Legion::Internal::Runtime::execute_task(Legion::Internal::TaskContext*, Legion::TaskLauncher const&, std::vector<Legion::OutputRequirement, std::allocator<Legion::OutputRequirement>>*)",[Unknown],0x7f91a0
,Legion::Runtime::execute_task,0,libregent.so,"Legion::Runtime::execute_task(Legion::Internal::TaskContext*, Legion::TaskLauncher const&, std::vector<Legion::OutputRequirement, std::allocator<Legion::OutputRequirement>>*)",[Unknown],0x48fa20
,legion_task_launcher_execute,0,libregent.so,legion_task_launcher_execute,[Unknown],0x4a8440
,$<main>,0,test_wcsph2.exe,$<main>,[Unknown],0x422eb0
,$__regent_task_10015_primary,0,test_wcsph2.exe,$__regent_task_10015_primary,[Unknown],0x422e50
,Realm::LocalTaskProcessor::execute_task,0,libregent.so,"Realm::LocalTaskProcessor::execute_task(unsigned int, Realm::ByteArrayRef const&)",[Unknown],0xec2ca0
,Realm::Task::execute_on_processor,0,libregent.so,Realm::Task::execute_on_processor(Realm::Processor),[Unknown],0xd90870
,Realm::UserThreadTaskScheduler::execute_task,0,libregent.so,Realm::UserThreadTaskScheduler::execute_task(Realm::Task*),[Unknown],0xd90b60
,Realm::ThreadedTaskScheduler::scheduler_loop,0,libregent.so,Realm::ThreadedTaskScheduler::scheduler_loop(void),[Unknown],0xd92e80
,Realm::UserThread::uthread_entry,0,libregent.so,Realm::UserThread::uthread_entry(void),[Unknown],0xd78bd0
_int_free,,47.759463,libc.so.6,_int_free,[Unknown],0x80f70
,Legion::Internal::TaskContext::check_region_dependence,22.462184,libregent.so,"Legion::Internal::TaskContext::check_region_dependence(unsigned int, Legion::IndexSpace, Legion::RegionRequirement const&, Legion::Internal::RegionUsage const&, Legion::RegionRequirement const&, bool) const",[Unknown],0x5a2fa0
,Legion::Internal::TaskContext::find_conflicting_regions,0,libregent.so,"Legion::Internal::TaskContext::find_conflicting_regions(Legion::Internal::TaskOp*, std::vector<Legion::PhysicalRegion, std::allocator<Legion::PhysicalRegion>>&)",[Unknown],0x5a4db0
,Legion::Internal::InnerContext::execute_task_launch,0,libregent.so,"Legion::Internal::InnerContext::execute_task_launch(Legion::Internal::TaskOp*, bool, Legion::Internal::LegionTrace*, bool, bool)",[Unknown],0x5adee0
,Legion::Internal::InnerContext::execute_task,0,libregent.so,"Legion::Internal::InnerContext::execute_task(Legion::TaskLauncher const&, std::vector<Legion::OutputRequirement, std::allocator<Legion::OutputRequirement>>*)",[Unknown],0x5ae180
,Legion::Internal::Runtime::execute_task,0,libregent.so,"Legion::Internal::Runtime::execute_task(Legion::Internal::TaskContext*, Legion::TaskLauncher const&, std::vector<Legion::OutputRequirement, std::allocator<Legion::OutputRequirement>>*)",[Unknown],0x7f91a0
,Legion::Runtime::execute_task,0,libregent.so,"Legion::Runtime::execute_task(Legion::Internal::TaskContext*, Legion::TaskLauncher const&, std::vector<Legion::OutputRequirement, std::allocator<Legion::OutputRequirement>>*)",[Unknown],0x48fa20
,legion_task_launcher_execute,0,libregent.so,legion_task_launcher_execute,[Unknown],0x4a8440
,$<main>,0,test_wcsph2.exe,$<main>,[Unknown],0x422eb0
,$__regent_task_10015_primary,0,test_wcsph2.exe,$__regent_task_10015_primary,[Unknown],0x422e50
,Realm::LocalTaskProcessor::execute_task,0,libregent.so,"Realm::LocalTaskProcessor::execute_task(unsigned int, Realm::ByteArrayRef const&)",[Unknown],0xec2ca0
,Realm::Task::execute_on_processor,0,libregent.so,Realm::Task::execute_on_processor(Realm::Processor),[Unknown],0xd90870
,Realm::UserThreadTaskScheduler::execute_task,0,libregent.so,Realm::UserThreadTaskScheduler::execute_task(Realm::Task*),[Unknown],0xd90b60
,Realm::ThreadedTaskScheduler::scheduler_loop,0,libregent.so,Realm::ThreadedTaskScheduler::scheduler_loop(void),[Unknown],0xd92e80
,Realm::UserThread::uthread_entry,0,libregent.so,Realm::UserThread::uthread_entry(void),[Unknown],0xd78bd0
Legion::PhysicalRegion::is_mapped,,36.460231,libregent.so,Legion::PhysicalRegion::is_mapped(void) const,[Unknown],0x48b070
,Legion::Internal::TaskContext::find_conflicting_regions,36.200255,libregent.so,"Legion::Internal::TaskContext::find_conflicting_regions(Legion::Internal::TaskOp*, std::vector<Legion::PhysicalRegion, std::allocator<Legion::PhysicalRegion>>&)",[Unknown],0x5a4db0
,Legion::Internal::InnerContext::execute_task_launch,0,libregent.so,"Legion::Internal::InnerContext::execute_task_launch(Legion::Internal::TaskOp*, bool, Legion::Internal::LegionTrace*, bool, bool)",[Unknown],0x5adee0
,Legion::Internal::InnerContext::execute_task,0,libregent.so,"Legion::Internal::InnerContext::execute_task(Legion::TaskLauncher const&, std::vector<Legion::OutputRequirement, std::allocator<Legion::OutputRequirement>>*)",[Unknown],0x5ae180
,Legion::Internal::Runtime::execute_task,0,libregent.so,"Legion::Internal::Runtime::execute_task(Legion::Internal::TaskContext*, Legion::TaskLauncher const&, std::vector<Legion::OutputRequirement, std::allocator<Legion::OutputRequirement>>*)",[Unknown],0x7f91a0
,Legion::Runtime::execute_task,0,libregent.so,"Legion::Runtime::execute_task(Legion::Internal::TaskContext*, Legion::TaskLauncher const&, std::vector<Legion::OutputRequirement, std::allocator<Legion::OutputRequirement>>*)",[Unknown],0x48fa20
,legion_task_launcher_execute,0,libregent.so,legion_task_launcher_execute,[Unknown],0x4a8440
,$<main>,0,test_wcsph2.exe,$<main>,[Unknown],0x422eb0
,$__regent_task_10015_primary,0,test_wcsph2.exe,$__regent_task_10015_primary,[Unknown],0x422e50
,Realm::LocalTaskProcessor::execute_task,0,libregent.so,"Realm::LocalTaskProcessor::execute_task(unsigned int, Realm::ByteArrayRef const&)",[Unknown],0xec2ca0
,Realm::Task::execute_on_processor,0,libregent.so,Realm::Task::execute_on_processor(Realm::Processor),[Unknown],0xd90870
,Realm::UserThreadTaskScheduler::execute_task,0,libregent.so,Realm::UserThreadTaskScheduler::execute_task(Realm::Task*),[Unknown],0xd90b60
,Realm::ThreadedTaskScheduler::scheduler_loop,0,libregent.so,Realm::ThreadedTaskScheduler::scheduler_loop(void),[Unknown],0xd92e80
,Realm::UserThread::uthread_entry,0,libregent.so,Realm::UserThread::uthread_entry(void),[Unknown],0xd78bd0
lightsighter commented 3 years ago

If these are the hotspots, it suggests you are making a truly huge number of regions in this application. How many regions do you think are being made? If you are willing to assert that the runtime does not need to check the safety of each operation launch with respect to these regions, then you can run with -lg:unsafe_launch flag. If you've messed up in your application at all through then you will get deadlocks.

LonelyCat124 commented 3 years ago

I'll have to recheck but I think it should be creating 4 regions, or 9 regions if attach/release creates regions. The difference between the two versions should be +1 region for the inbuilt HDF5 version. I'll double check the code tomorrow but that is my current memory.

lightsighter commented 3 years ago

It's making way more than that (or many fields, hard to say) if we're spending most of our time in that particular piece of code. Please generate a Legion Spy log of a shorter run with -lg:spy.

LonelyCat124 commented 3 years ago

The spy logs are here https://drive.google.com/drive/folders/1tU7x8M21HkPngbUGiLUQ4V3OSuRMgCJL?usp=sharing

I'm currently running spy with -dezk on them to see what output I get but the event graphs are taking a long time to build (at least for the inbuilt version)

(This is 2 timesteps with file IO at the end of both, so about as small as I can run)

lightsighter commented 3 years ago

Please make a new Legion Prof performance profile with -lg:unsafe_launch.

LonelyCat124 commented 3 years ago

I'm recounting regions - I believe I create: particle_array config io_region padded_particle_array (and delete particle_array) TradeQueueRegion

so exactly 5 regions should be created. The loop section of the code shouldn't create any additional regions.

It seems like -lg:unsafe_launch hangs (never doing output) if i return to the original profile strategy of output every 0.001s sim time, but if i ask for output every step is does complete 5 IO steps correctly.

Output with an initial IO step (and printing at the end of each step:

Timestep = 0.000059
Time = 0.000059, runtime = 0, step_count = 0  <<<< IO HERE
1
2

IO every step (no hang, seems to run to completion):

Time = 0.000059, runtime = 0, step_count = 0
1
Time = 0.000117, runtime = 0, step_count = 1
2
Time = 0.000176, runtime = 1, step_count = 2
3
Time = 0.000235, runtime = 1, step_count = 3
4
Time = 0.000293, runtime = 1, step_count = 4
5

I can get a profile of this if it would be useful but I think its a lot shorter than the previous profiles.

The non inbuilt HDF5 version runs fine with unsafe_launch.

The only different task between the two is the copy_task which is used with the attach/release version to copy data into the HDF5 region.

lightsighter commented 3 years ago

The -lg:unsafe_launch is exactly what it says: it is unsafe; it removes the protections for detecting when the runtime needs to unmap and remap physical regions that are currently mapped in the executing task to prevent a dependence between the parent task and a child task being launched, which will exhibit as a hang. If you're hanging with -lg:unsafe_launch it means that the runtime is having to unmap and remap for you around these sub-task launches. That is going to be a major major performance bug, likely big enough to account for the performance discrepancy that you are observing. You (or the Regent compiler if this program is written in Regent) need to do a better job of managing the mapping and unmapping of regions, so that regions are unmapped for long periods of time while you launch sub-tasks and only remapping them when you absolutely have to.

LonelyCat124 commented 3 years ago

Ok - since it only appears to hang (or have performance degredation) when I use attach/release etc. to do file output (and not when I use HDF5 directly) I assume its related somewhat to mapping the region used for HDF5? Or I'm initialising it incorrectly/strangely (though I followed the example setup on Regent's website pretty much)

This program is written in Regent - is there a subset of mapper calls that would be easiest to look into to try to find out whats happening/try to avoid the problem? I've not really had much success wrapping my head around custom mappers yet.

lightsighter commented 3 years ago

This the mapping and unmapping of PhysicalRegion objects in Legion which are controlled by the application, in this case Regent. You can't control them with the mapper. I think you and @elliottslaughter should get to the bottom of why the Regent compiler is doing such a poor job in this case.

elliottslaughter commented 3 years ago

Regent should be optimal with respect to mapping/unmapping unless you have a statement with (a) uses a region locally, and (b) passes it to a task or another remote operation, (c) at the same time. This is unusual in practice, so I'd be a bit surprised.

One thing you could do is attempt to mark tasks as __demand(__leaf) or __demand(__inner). This will not change the behavior of the compiler. But it may teach you more about your code and what it is doing. If you're going back and forth with repeatedly mapping an instance, that could indicate a non-leaf, non-inner task, which is generally speaking not a recommended pattern.

If you were able to make all tasks __leaf or __inner, then this would be guaranteed not to happen.

lightsighter commented 3 years ago

How does regent deal with PhysicalRegion objects that have been mapped as part of attach operations?

elliottslaughter commented 3 years ago

I think we just store them in a variable and look it back up on detach. This fairly hacky and would be relatively easy to mess up if the application were doing something complicated.

If I remember correctly, I think the use of attach/detach with a non-inner task is officially unsupported in Regent for this reason.

LonelyCat124 commented 3 years ago

Ah, ok - I think currently the code has the attach/detach in the main task, and then launches the copy task which I've not marked as __leaf. Is the main "non-inner"? I'll try marking the copy task as a leaf task on Monday and see if the issue persists.

elliottslaughter commented 3 years ago

The issue is more likely to be in the main task than in anything it calls, assuming main is the only one that uses attach/detach.

I'd start by trying to mark main as __inner and see what that says.

(Using __leaf for other tasks is good in general, but is less likely to be related to this particular issue.)

LonelyCat124 commented 3 years ago

Ok - I started by trying to mark as __inner , but this was disallowed because I was setting the initial data - which I then moved into a separate task - but other of the setup code inside metaprogrammed sections still disallowed the main task being inner.

I then tried moving the timestep code into its own task, and marking that as inner, but am having a few issues due to not passing the cell partition through to the task, which means the code that generates all the tasks don't work. I'll look into that now.

Its possible that c is happening, though other than the initalisation I believe all the accesses to the data should be from other tasks (though I need to double check). I could also switch the code that does the I/O to be a task (as opposed to the current quote), but my difficulty is generating tasks with __demand, i.e.

function generate_task()

--Cannot use __demand with local
local task generated_task()
  --some generated task code
end
return generated_task
end

is not allowed - I assume I can maybe do something with the resulting generated_task table/object but I've not worked it out yet. I have the same issue with generated tasks being leaf tasks since I can't use __demand to do this.

LonelyCat124 commented 3 years ago

Ok, I managed to cut some stuff from the code and use an inner task and this fixed the problem, so the only remaining issue I have is how I can generate an inner task with metaprogramming - i think with that then it would all be fine.

elliottslaughter commented 3 years ago

The syntax is a bit odd, but you should be able to do:

local __demand(__inner) task ...

We don't get to control the use of local so unfortunately we're forced to put all Regent keywords after that point.

LonelyCat124 commented 3 years ago

That seems to work - the only remaining thing to implement it correctly I can't work out is if there's a way to pass a string into a task. I can't use rawstring since its just a pointer under the hood - is the only option to make a region and set that each time? i.e.

fspace string_array{
  a_char : char
}
...
var string_reg = region(ispace(int1d, 1024), string_array)

var index : int = 0
while filename[index] != '\0' do
  string_reg[int1d(index)] = filename[index]
  index = index + 1
end

and then reconstruct the string in the inner task to attach/detach? Is there an easier way to put a string in a region?

Edit: Ah, but accessing that region prevents the task being inner...any better ideas? Edit 2: Just found regentlib.string which I'll try.

Edit 3: All seems to be working now

LonelyCat124 commented 3 years ago

I think this is all fixed (so issue can be closed). Suggestion for the regent website would be to add a comment saying to only use attach/detach in inner tasks.

elliottslaughter commented 3 years ago

Thanks for letting us know.

I added a new warning under the "Attach and Detach" section of the reference: http://regent-lang.org/reference/ . (Unfortunately header links are broken at the moment so I can't give you a direct link, but you can search for the header.)