Large Environment Properties

Robadob commented 4 years ago

Striuvad has the requirement of storing ~32,000 host managed agents. These should be represented as environment properties (for technical reasons), but clearly won't fit in constant cache so need to be allocated in global memory (accessed via ldg?).

cc @ptheywood

ptheywood commented 4 years ago

The upper limit on the population is more like 2^12, although in practice that will never be achieved.

Robadob commented 3 years ago

Going to be working on what we've coined 'macro environment' over the next few weeks. Which will combine this and #308.

Defining a macro property

I would ideally like to treat all macro properties the same, so a single create method with default args for length.

/**
 * Define a new environment macro property
 *
 * Environment macro properties are designed for large environment properties, too large of fast constant memory.
 * This means they must instead be stored in slower global memory, however that allows them to be modified during agent functions via a limited set of atomic operations.
 *
 * @param propertyName Name of the macro property
 * @tparam T Type of the macro property
 * @tparam i Length of the first dimension of the macro property, default 1
 * @tparam j Length of the seconddimension of the macro property, default 1
 * @tparam k Length of the third dimension of the macro property, default 1
 */
template<typename T, size_type i=1, size_type j=1, size_Type k=1>
void EnvironmentDescription::newMacroProperty(const std::string &propertyName);

Not clear that we can sensibly allow users to pass a default value here (support for {{{a,b},{c,d}}} style notation seems a bit unstable, and nested std::array templates is pretty grim), so need some further thought (this will affect host function access too).

Device Access

The core goal here is to prevent a user needing to do repeated accesses to curve, if they are accessing multiple values from/making multiple calls to a single macro property.

e.g. use-case

DeviceMacroProperty<int, 10, 10> mp = FLAMEGPU->environment.getMacroProperty<int, 10, 10>("counters");
for (int i = 0, i < 10; ++i) {
    for (int j = 0, j < 10; ++j) {
        mp[i][j].add(1);  // Need something smart here to make the array accessing constexpr, and to differentiate 0d/1d/2d/3d without branching.
    }
}
// We might be able to do this like this, rather than needing a .get(), if we abuse explicitly defined cast operators.
int k = mp[i][j]; // This should then throw a seatbelts exception, as atomic mutation was used in same agent fn/layer

Whilst we could do seatbelts checks at a per-element level, i'm not convinced that's worth while currently.

Small example of how that could be implemented here. Shame we can't have a class with same name, diff template args (short of some complicated think with extra params, to reuse the class but ignore things).

Host Access

As noted above, currently a little stumped at the best way to pass around a multi-dimensional array in C++. I would normally just use a well defined pointer, but that's not very user friendly. Similarly, how will np.arrays and/or nested lists map to it in the python interface?

This will obviously require synchronisation like device agent vector too.

mondus commented 3 years ago

I would have no objection to simplifying this greatly and having only scalar and 1d array values. This is more consistent with agent variables and the mapping of single to multi dimension space is something a modeller can do. Having 2d and 3d is nice in the same way that CUDA has 2d/3d blocking but 1d gets the job done and offers the greatest flexibility.

MILeach commented 3 years ago

Came up with this which allows pretty uniform creation of properties with any dimensions link. I think this solves what you wanted with same class name with different template args @Robadob? Internally uses std::array at the moment to store the dimensions but could use any representation. Also currently uses c++17 for the fold expression but can workaround that if necessary

I'm in favour of allowing multiple dimensions as it gives a better abstraction of domain problems and reduces scope for error with type safety & bounds checking. It also is less work for a modeller if we do it, although even for multi-dimensional properties, some conversion will probably be required from location to index.

Robadob commented 3 years ago

Few problems with your suggested code @MILeach :

Using new in device code is VERY expensive.
RTC is unable to compile std::array currently (the system header is missing, it would probably work if a replacement was added to Jitify).

Otherwise it seems to achieve what I was looking for, albeit far more complicated varadic template soup than I was planning. (I had thought about just using the 3 one, but setting spare args to 0, and ignoring them at compile time via constexpr).

MILeach commented 3 years ago

Yeah the new and std::array were really just there as examples/placeholders, memory allocation would obviously need to be handled differently.

Yeah that would work too, I think it depends where we want to draw the line on flexibility. My implementation would allow 4D ones too which could be used to track different layers across a 3D environment etc in a single macro property.

The interface looks good though and seems consistent with the rest of the fgpu api.

Robadob commented 3 years ago

Yeah, I agree yours is far more dynamic, which is always a good thing. I'm just not really 100% up on variadic template soup.

4D makes sense to me in a context of a 3D array of Array variables, so 3D, 1D, rather than outright 4D. E.g. if a user needed a 3D array of vec3.

MILeach commented 3 years ago

Yes exactly re: 4D, alternatively an array of field variables, e.g. if you had 100 different chemical concentrations or something they could be indexed by the first value, then x, y, z.

I could see cases where this would be useful rather than having to manually specify 100 MacroEnvProperties

ptheywood commented 3 years ago

Going to be working on what we've coined 'macro environment' over the next few weeks. Which will combine this and #308.

Is this going to be the only way to have atomically mutable environmental properties? In which case I'm not convinved by the "MacroProperty" name (but I don't have a better suggestion yet). Macro property doesn't really fit to strit's use case either IMO, but that might just be bias towards my use of macro in the past.

mondus: I would have no objection to simplifying this greatly and having only scalar and 1d array values

I'm not as strongly opposed to providing 2d/3d versions, but for strit's use case which this issue originally mentions, the environment is not a square / cube, but a hexagonal lattice representing the inner surface of a torus, which can be mapped to 2D square indexing with some extra steps, but I'd be inclined to just use 1D and then perform the hex coord mapping directly to that (which has to be done for agents anyway, so the methods are already in place)

The only downside I see otherwise of treating them all the same is any overhead costs at runtime when not using all 3 dims (i.e. a single value). Probably negligable compard to curve itself though / the cost of the atomic ops, and as you say with constexpr if's this shoulnd't actually be a concern, so discard this point.

Also currently uses c++17 for the fold expression but can workaround that if necessary

We will be going c++17 when viable. Bede is testing the RHEL8/CUDA 11.3 install, so we can sortof claim that's supported (although it may become unavailable for a short while). Bessemer is still 10.x only, I've nudged on github, but given it's August things might be slow (and the use of easybuild might prevent very recent 11.x, though <= 11.1 should be achievable). I believe the driver is already 11.0, but the cuda install wasn't done / made public?

ignoring them at compile time via constexpr

if constexpr is c++17 too, though optimised builds of regular if statements which only contain constexpr'd values might get optimised out (I assume?), which would/should apply to the template args. Debug builds might still have the dead branches kicking around, but then who cares.

Otherwise it seems reasonable to me / Matt's rough implementation seems sane.

tldr;

Looks decent.
We will be able to rely on c++17 feature "soon"
- Until then CI will be unhappy for CUDA 10 builds.
Not convinved by the name, macro has other implications to me. I don't have a better name though.

Robadob commented 3 years ago

Is this going to be the only way to have atomically mutable environmental properties? In which case I'm not convinved by the "MacroProperty" name (but I don't have a better suggestion yet).

Yeah I'm not dead set on the name either. I would like to have a separate form validation only atomics, which are only active if built with a compile macro on. But that's a different topic.

Rest is fair

ptheywood commented 3 years ago

Yeah I'm not dead set on the name either. I would like to have a separate form validation only atomics, which are only active if built with a compile macro on. But that's a different topic.

I think that will be able to be done already using the above with enough macro soup by the user, but a separate implementaion in the library which uses constexpr's (rather than adding more macro's, when we're c++17) might be a nice addition

Robadob commented 3 years ago

Better to be consistent and have all our build flags as macros, than to confuse it by mixing and matching. (But this offtopic, I think there's a dedicated issue for that anyway)

Robadob commented 3 years ago

Closed by #643

FLAMEGPU / FLAMEGPU2

Large Environment Properties #162

tldr;