StanfordLegion / legion

The Legion Parallel Programming System
https://legion.stanford.edu
Apache License 2.0
676 stars 145 forks source link

Regent: codegen bug in unpack #509

Open jgurhem opened 5 years ago

jgurhem commented 5 years ago

This is a simplified version of the piece of code I'm trying to use.

import "regent"
local c = regentlib.c

fspace AStruct {
  r : region(ispace(int1d), double),
}

fspace trace() {
  values : AStruct;
}

task gen_trace()
  var tr : trace
  var re = region(ispace(int1d, 10), double)
  fill(re, 1)
  tr.values = AStruct { r = re }
  return tr
end

task print_trace(tr : trace)
  for i in tr.values.r do
--    c.printf("%lf\n", tr.values.r[i])
   c.printf("--")
  end
end

task main()
  var t : trace = gen_trace()
  print_trace(t)
end

regentlib.start(main)

I obtain the error /usr/local/legion/language/src/regent/codegen.t:918: failed to find appropriate for region region(ispace(int1d), double) in unpack. I think that the for loop to access the values of the region in the print_trace task produce the previous error.

I am trying to initialize a region in the gen_trace task and to access the data of the region in the print_trace task. Unfortunately, I'm not able to find a way to properly access the values.

Actually, I was trying to use a field space like this one:

fspace wire(rpn : region(node),
            rsn : region(node),
            rgn : region(node)) {
  in_ptr : ptr(node, rpn, rsn),
  out_ptr : ptr(node, rpn, rsn, rgn),
  inductance : float,
  resistance : float,
  wire_cap : float,
  current : currents,
  voltage : voltages,
}

It can be found in circuit_sparse.rg but I was unable to create the field space and put values into it.

Could you explain me how to use the field spaces in those cases ?

elliottslaughter commented 5 years ago

There is a compiler bug here, but the commented out line that tries to access tr.values.r[i] wouldn't work even if we fixed it.

The reason is that you can store regions in an fspace, but those regions don't get any privileges. So you can't read or write any values to regions that you store in an fspace.

What you can do is read and write values of a region which is a subregion of some other region that the task has access to. So this works:

import "regent"
local c = regentlib.c

fspace AStruct(r : region(ispace(int1d), double)) {
  s : region(ispace(int1d), double),
} where s <= r end

fspace trace(r : region(ispace(int1d), double)) {
  values : AStruct(r);
}

task gen_trace(re : region(ispace(int1d, 10), double))
where reads writes(re) do
  var tr : trace(re)
  fill(re, 1)
  tr.values = [AStruct(re)] { s = re }
  return tr
end

task print_trace(re : region(ispace(int1d, 10), double), tr : trace(re))
where reads writes(re) do
  for i in tr.values.s do
   c.printf("%lf\n", tr.values.s[i])
  end
end

task main()
  var re = region(ispace(int1d, 10), double)
  var t : trace(re) = gen_trace(re)
  print_trace(re, t)
end

regentlib.start(main)

(For simplicity I just stored the region itself in the fspace, but you could have instead done some partitioning here and stored a subregion of the data.)

Does this help?

jgurhem commented 5 years ago

It helps a lot ! Thank you !

I was wondering if it is possible to put the region directly in the trace field space instead of using a intermediate field space ? Moreover, is there a way to define the size of the region when the trace is created - in the gen_trace task ? (I would like to read the values from a file)

Thank you for your help !

elliottslaughter commented 5 years ago

This is the pattern we generally recommend for this scenario, though it won't quite do what you want (I'll explain below).

import "regent"
local c = regentlib.c

fspace trace(r : region(ispace(int1d), double)) {
  s : region(ispace(int1d), double),
} where s <= r end

task gen_trace(re : region(ispace(int1d, 10), double))
where reads writes(re) do
  var pe = partition(equal, re, ispace(int1d, 1)) -- replace this whatever partitioning you want
  var se = pe[0]
  var tr = [trace(re)] { s = se }
  fill(se, 1)
  return tr
end

task print_trace(re : region(ispace(int1d, 10), double), tr : trace(re))
where reads writes(re) do
  for i in tr.s do
   c.printf("%lf\n", tr.s[i])
  end
end

task main()
  var re = region(ispace(int1d, 10), double)
  var t : trace(re) = gen_trace(re)
  print_trace(re, t)
end

regentlib.start(main)

You can replace the line that does the partitioning to select any subset of the original data you want---including a dynamic amount based on the file you're reading.

The one caveat is that in this version you're still going to allocate the entire region re into memory. This would be a problem if you were intending to use this to save on space. But if you can pick an upper bound on the region size that's larger than what you need but still small enough to fit in memory, you're probably ok.

If you really, really want to use only the memory from se, the way to do that is to always launch tasks on se. That would require an extra indirection (gen_trace would have to call gen_trace_helper, print_trace would have to call print_trace_helper and so on). This is somewhat ugly so I didn't show it above, but it can be done if you need it.

jgurhem commented 5 years ago

Thank you very much !

I still have a few questions: Is there a way to avoid hardcoding the max number of values in the region (10) ? If I want to create several traces, should I create another region or can I reuse the same ? For instance, my file contains, let's say 6 values out of the 10 max values, how could I create a partition which will contain all my values ?

Thank you for your help !

elliottslaughter commented 5 years ago

This can definitely be done, but I want to check up on a couple things first to make sure this effort is actually necessary.

How much data (rough order of magnitude) are you dealing with? If it's less than about 10-100 MB in total, I'd be inclined not to worry about making the regions tight. Instead, it would be much easier to create a region to hold the maximum amount of data and keep a separate variable or region that tracks how much you're actually using. If you have multiple traces, you could do this with a 2D region, and make one dimension the trace number and the other dimension the trace data. E.g. you could make a region with 100 traces times 1 MB per trace, and this will still be well below the amount of memory you can expect to have on nearly every machine you might run on. You can keep a separate region of 100 ints to track how much data within each trace you're using.

If the data is so large that you really do need tight allocations (or else you'll run out of memory), then you'd basically take the approach above and apply it to the last set of code I posted. I.e. make a 2D region, partition it first by trace, and then within a trace partition by how much data the trace has.

Let me know if that makes sense.

elliottslaughter commented 5 years ago

Forgot to include this link in the last response:

https://github.com/StanfordLegion/legion/blob/master/language/examples/stencil_common.rg#L53-L68

This shows how to read command line parameters inside Regent. This is one possible way to configure the upper bound on region size, so you don't need to hard-code it as a constant in your program.

jgurhem commented 5 years ago

Thank you for your help !

For 1), it was not very clear. It is the 10 in task gen_trace(re : region(ispace(int1d, 10), double)). I tried with task gen_trace(re : region(ispace(int1d), double)) instead and it works perfectly. For 2), it is very clear. I was thinking about making a region of traces and partition it to process them in parallel. I have a few other parameters than the number of values in the trace, so it will be easier to partition this way. But the other way which is a trace with regions for each parameter, is another possibility. Which way do you think is better in regent ? For 3), I was focusing on doing

  for i in tr.s do
   c.printf("%lf\n", tr.s[i])
  end

to iterate over the region which contains my values (opposed to the empty and unused part of the region). I was wondering if it was possible to create a partition which size is exactly the number of values. I can simply iterate over 'for i = 0, tr.n do' where tr.n is the number of values.

Thank you for your replies again !

elliottslaughter commented 5 years ago

Re (2): If you have a fixed number of parameters, I think you should be able to go either way; it's just a matter of taste. If you have a dynamic number of parameters (assuming I understand what you mean by this) you'd be forced to make one region on the outside and partition it, like you're doing above.

Re (3): you can always track the size separately from the region itself. E.g. for a simple 1D case you can put the maximum size n inside the trace and then write your loop as:

for i = 0, tr.n do
  c.printf("%lf\n", tr.s[i])
end

Hope that helps.

jgurhem commented 5 years ago

Yes ! It helped me a lot !

streichler commented 5 years ago

@elliottslaughter are we done here? You refer to a compiler bug higher up in the comment chain - does that still need to be fixed?

elliottslaughter commented 5 years ago

Yes, the original error is a bug and should be put in the queue to be fixed.