ThrudPrimrose / dace

DaCe - Data Centric Parallel Programming
http://dace.is/fast
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Transient Outside A Map Scope is Mapped to GPU #5

Open ThrudPrimrose opened 5 days ago

ThrudPrimrose commented 5 days ago

Transient scalars are always mapped to GPU storage, even if it is in a Default map that is not mapped to GPU.

I can prevent a map from being offloaded to a GPU by setting it as a "host_map". It is a small feature I have added. Roughly at line in gpu_transform_sdfg.py 315-225, I can prevent a Default map from being mapped to GPU (GPU_Device) by using an additional variable on the Map Node:

...
elif isinstance(node, nodes.EntryNode):
    if not isinstance(node, nodes.MapEntry) or not node.map.host_map:
        node.schedule = dtypes.ScheduleType.GPU_Device
        gpu_nodes.add((state, node))
...

Note: This change is not present in the latest commit.

However, I can't get it not to put the transient Scalar to host storage, even when it is in a map that is a "host_map".

I will attach an SDFG to reproduce the behaviour: (Rm .json at the end github does not support sdfgz format) cut_2.sdfgz.json

We have two chains that have a (tasklets->access node) pattern, initializing data containers (writing to access nodes). Even if I put these chains within a trivial map or not "levmask" variable is always mapped to GPU_transient storage.

Running: sdfg.apply_gpu_transformations(validate = True, validate_all = True, permissive = True, sequential_innermaps=True, register_transients=False, simplify=False) on this map, results with invalid code because levmask is mapped to GPU_Global storage, but tasklet is on host. To reproduce download the SDFG and run this script:

from dace.sdfg.sdfg import SDFG

sdfg = SDFG.from_file("cut_2.sdfgz")

try:
    sdfg.apply_gpu_transformations(validate = True, validate_all = True, permissive = True, sequential_innermaps=True, register_transients=False, simplify=False)
    sdfg.validate()
except Exception as e:
    raise Exception(e)
finally:
    sdfg.save("cut_2_to_gpu_2.sdfgz")

If I encircle in a map, then there is no problem. If I encircle this tasklet with a map, but then decided that this map should stay on host - it still maps the "levmask" to GPU_Global storage with the map schedule on CPU. TO reproduce you can sue this script:

from dace.sdfg.sdfg import SDFG
from dace.transformation.icon.map_over_tasklet_access_node_tasklet import MapOverTaskletAccessNodeTaskelet
from dace.transformation.icon.force_on_host import ForceOnHost

sdfg = SDFG.from_file("cut_2.sdfgz")

sdfg.apply_transformations_repeated(
    MapOverTaskletAccessNodeTaskelet,
    validate=False,
    validate_all=True)
sdfg.save("cut_2_preprocessed_1.sdfgz")
sdfg.apply_transformations_repeated(
    ForceOnHost,
    options={"access_names":["levmask"]},
    validate=True,
    validate_all=True)
sdfg.save("cut_2_preprocessed_2.sdfgz")

try:
    sdfg.apply_gpu_transformations(validate = True, validate_all = True, permissive = True, sequential_innermaps=True, register_transients=False, simplify=False)
    sdfg.validate()
except Exception as e:
    raise Exception(e)
finally:
    sdfg.save("cut_2_to_gpu_2.sdfgz")

This script puts a trivial map around the (tasklet -> access node) chain, and then sets any map that reads from or writes to a data container named "levmask" as a host map. The SDFG looks as follows: cut_2_preprocessed_2 .sdfgz.json

ThrudPrimrose commented 5 days ago

I can fix my issue by introducing a host_map and host_data fields and setting them explicitly. This prevents the issue for my use case.

But still, transients outside map scopes should not be mapped to the GPU I think.