ownership in paraview outputs

jacobmerson commented 1 month ago

With the current IO routines the vtkGhostType array is only written out if the partition type is set to OMEGA_H_GHOSTED

https://github.com/SCOREC/omega_h/blob/00131eb0133796f39ff2ecd0962d9b3d14d5887b/src/Omega_h_vtk.cpp#L39-L41

As far as I can tell from the limited information available about the ghosting in pvtu file formats, it looks like this is an array that should exist for both cells and points and is essentially just whether an entity is owned in a specific partition.

@cwsmith @joshia5 do you know if there is a specific reason why OMEGA_H_VERT_BASED and OMEGA_H_ELEM_BASED don't include these ownership arrays? I think this would resolve the overlaps when reading pumi-pic outputs as well as some other oddities that we faced when debugging PCMS/redev visualizations.

For example, I think the following is valid

<PUnstructuredGrid GhostLevel="1">
  <PPointData>
    <!-- Other data arrays -->
    <PDataArray type="Int32" Name="vtkGhostType"/>
  </PPointData>
  <PCellData>
    <!-- Other data arrays -->
    <PDataArray type="Int32" Name="vtkGhostType"/>
  </PCellData>
  <Piece Source="piece1.vtu"/>
  <Piece Source="piece2.vtu"/>
  <Piece Source="piece3.vtu"/>
</PUnstructuredGrid>

cwsmith commented 1 month ago

I think the idea was to rely on the 'owner' array associated with vertices, edges, and faces to determine which entities on the boundary are owned and which are copies.

The overlaps you're referring to are when two or more mesh entities are rendered in the same location when viewing multiple parts of a partitioned mesh? If so, I agree, It would be nice if there was a way to avoid that.

Can we manually create a two edge mesh, one edge per process/vtu file, and test this?

jacobmerson commented 1 month ago

That's a good idea.

jacobmerson commented 1 month ago

I was doing some testing with XGCm data and the vtkGhostType arrays do seem to do what we want. Although, in paraview, you have to apply the RemoveGhostInformation filter to actually remove the ghosted cells.

Here is an example script to add the ghost arrays to the .vtu files.

import base64
import zlib
import xml.etree.ElementTree as ET
import numpy as np
import sys

def encoded_string_as_array(encoded_data, dtype='<i4'):
    #encoded_data = str(encoded_data)
    end_of_header = encoded_data.find("=")+1
    header = encoded_data[0:end_of_header]
    body = encoded_data[end_of_header:]
    header = base64.b64decode(header)
    header = np.frombuffer(header, dtype='<i8')
    compressed_data = base64.b64decode(body)
    assert(len(compressed_data) == header[-1])
    uncompressed_data = zlib.decompress(compressed_data)
    assert(len(uncompressed_data) == header[1])
    data_array = np.frombuffer(uncompressed_data, dtype=dtype)
    return data_array

def array_as_encoded_string(array):
    compressed_data = zlib.compress(array.data)
    header = np.empty(4, dtype='<i8')
    header[0] = 1
    header [1:3] = array.nbytes
    header[3] = len(compressed_data)
    encoded_header = base64.b64encode(header.data)
    encoded_data = base64.b64encode(compressed_data)
    full_encoded = encoded_header+encoded_data
    return full_encoded.decode("utf-8")

def get_point_ownership(root):
    ownership = [x for x in root.findall('.//PointData/DataArray') if x.attrib["Name"] == "ownership" ][0]
    encoded_data = ownership.text.strip().rstrip()
    return encoded_string_as_array(encoded_data)

def get_cell_ownership(root):
    ownership = [x for x in root.findall('.//CellData/DataArray') if x.attrib["Name"] == "ownership" ][0]
    encoded_data = ownership.text.strip().rstrip()
    return encoded_string_as_array(encoded_data)

def get_ghost_array(ownership_array, current_partition):
    # ghost array should be 0 for any entries that are owned
    # on current partition and 1 for those that are not on the current partition
    ghost_array = np.empty(len(ownership_array), dtype='<u1')
    ghost_array[:] = ownership_array != current_partition
    return ghost_array

def write_point_ghost(root, current_partition):
    point_ownership = get_point_ownership(root)
    point_ghost = get_ghost_array(point_ownership, current_partition)
    assert(point_ghost.dtype == np.uint8)
    points = root.find('.//PointData')
    ghost = ET.SubElement(points, 'DataArray', type="UInt8", Name="vtkGhostType", NumberOfComponents="1", format="binary")
    ghost.text = array_as_encoded_string(point_ghost)

def write_cell_ghost(root, current_partition):
    cell_ownership = get_cell_ownership(root)
    cell_ghost = get_ghost_array(cell_ownership, current_partition)
    assert(cell_ghost.dtype == np.uint8)
    cells = root.find('.//CellData')
    ghost = ET.SubElement(cells, 'DataArray', type="UInt8", Name="vtkGhostType", NumberOfComponents="1", format="binary")
    ghost.text = array_as_encoded_string(cell_ghost)

if __name__ == "__main__":
    for current_partition in range(0,6):
        read_path = "../"
        filename = f"piece_g0_m{current_partition}"
        tree = ET.parse(read_path+filename+".vtu")
        root = tree.getroot()

        write_point_ghost(root, current_partition)
        write_cell_ghost(root, current_partition)

        tree = ET.ElementTree(root)
        tree.write(f"{filename}.vtu", encoding="utf-8", xml_declaration=True)

cwsmith commented 4 weeks ago

Looks good. Thank you for the script. Have you tried this on a mesh with no ghosting (i.e., just shared vertices, edges, and faces on the part boundaries)? If not, I'll take a look.

jacobmerson commented 4 weeks ago

I'm not sure if I understand the distinction you are getting at. Although the array is called vtkGhostType it is just a boolean that specifies if the cell or vertex is owned in the current vtu/partition. You specify the values in the vtu file, so the replicated cells/vertices will be "owned" by one or more vtu files. Paraview doesn't care if you say more than one vtu owns a particular entity.

cwsmith commented 4 weeks ago

Right. I'm just wondering if the RemoveGhostInformation will work as expected when vertices are the entities marked as 'ghosts'; that is the more common use case when visualizing a partitioned mesh.

jacobmerson commented 4 weeks ago

The way I did it so far, I provided a vtkGhostType array for both vertices and cells. I will check what happens when I do it only for cells, and only for vertices.

jacobmerson commented 4 weeks ago

Here are the number of cells and points reported by paraview

Full mesh:

1,626,536 cells
819,447 points

Both cells and verts marked with vtkGhostType after RemoveGhostInformation filter:

611,359 cells
312,936 points

Only cells marked with vtkGhostType after RemoveGhostInformation filter:

611,359 cells
312,936 points

Only verts marked with vtkGhostType after RemoveGhostInformation filter:

1,626,536 cells
819,447 points

So, it looks like paraview is handling ownership on a cell basis and doesn't handle the vert only cases.

jacobmerson commented 2 weeks ago

Based on our discussion, the modifications should be made to pumi-pic and the current Omega_h implementation is correct.

SCOREC / omega_h

ownership in paraview outputs #109