Use greedy mesher for RLE, and RLE for greedy mesher

Potential optimization for you to try.

In trying to write my own voxel engine and reduce memory use, I came up with the idea of generating the bitmap for the binary greedy mesher from an RLE voxel chunk. My approach is using 32x32x32 chunks and I'm only meshing chunks, so it's less-optimal than yours; you might be able to adapt it to your mesher though.

The basic idea is simple:

A chunk stores voxel data run-length encoded. For cubes of 32 length, voxel data is stored as (u16 position, u16 rle, type of voxel), with the u16 containing 5-bit x, y, and z components.
A chunk also stores a chronological list of deltas. When a change is made, it is appended to the list of deltas.
The bitmap for the greedy mesher is generated by decoding the RLE data.
The deltas are played back in order, setting and unsetting bits in the bitmap.
The greedy mesher is run, generating both the mesh and new RLE voxel data.
The deltas and bitmap are discarded.

In my case I'm generating the bitmaps for xy, yz, and xz in separate threads and joining them at the end of generation. This code probably doesn't work but it's this general idea:

        let voxels = self.voxels.iter();  // only contains solid voxels
        let xy_handle = thread::spawn(move || {
            for (k, (kind, r)) in voxels {
                let location: ChunkLocation = ChunkLocation { location: k };
                let rle: ChunkLocation = ChunkLocation { location: r };
                // These are run across x with a length of rle.x(), so create a
                // string of 1 bits rle.x long, then put the left-most bit at x
                let mask = ((2 << rle.x()) - 1) << location.x();

                // These need to propagate across all y and z locations
                for y in location.y()..=location.y()+rle.y() {
                    for z in location.z()..=location.z()+rle.z() {
                        xy[z][y] |= mask;
                    }
                }
                // FIXME:  iterate through the deltas here and fix up xy
            }
            xy
        });

Same code for yz and xz planes, except using rle,y() or rle.z() and location.y() or location.z() and using x, y, or z as appropriate for the nested loop. The mask expands across the axis represented by the u32 (or u64 in your case); the loop expands across the other two axes.

You can generate an initial mesh and RLE voxel data by just sticking all the voxels in one by one as deltas; loading chunks from disk would probably be faster if they were stored RLE to avoid chewing through a long list of deltas.

Not sure if this is a good approach but figured I'd raise it.

TanTanDev / binary_greedy_mesher_demo

Use greedy mesher for RLE, and RLE for greedy mesher #8