Avoiding the vertex hash table

mj-saunders commented 4 years ago

While going through an exercise of converting the finished tutorial code into C, I got thinking about the hash table. In part because I'm having an interesting time learning to write a decent hash function. Then today, also reading that avoiding the use of floats as hash table keys is good :)

I had the thought that in this use case it may not be necessary at all. I've tweaked the code as follows, and it seems to run fine. The commented lines are just left in as reference. You should be able to paste this right in for testing. No need for hash function and table, and only creates vertex when necessary.

Is there any reason why this approach might not work or be suitable? As far as I can tell it should be stable.

Let me know what you think. I'll file a pull request if the concensus is positive.

        //std::unordered_map<Vertex, uint32_t> uniqueVertices{};
        std::vector<uint32_t> ref_indices(attrib.vertices.size(), UINT32_MAX);

        for (const auto &shape : shapes) {
            for (const auto &index : shape.mesh.indices) {
                if (ref_indices[index.vertex_index] == UINT32_MAX) {
                    ref_indices[index.vertex_index] = static_cast<uint32_t>(vertices.size());

                    Vertex vertex{};

                    vertex.pos = {
                        attrib.vertices[3 * index.vertex_index + 0],
                        attrib.vertices[3 * index.vertex_index + 1],
                        attrib.vertices[3 * index.vertex_index + 2]
                    };

                    vertex.texCoord= {
                        attrib.texcoords[2 * index.texcoord_index + 0],
                        1.0f - attrib.texcoords[2 * index.texcoord_index + 1]
                    };

                    vertex.color = {1.0f, 1.0f, 1.0f};

                    vertices.push_back(vertex);

                    //if (uniqueVertices.count(vertex) == 0) {
                    //    uniqueVertices[vertex] = static_cast<uint32_t>(vertices.size());
                    //    vertices.push_back(vertex);
                    //}
                }

                indices.push_back(ref_indices[index.vertex_index]);
            }
        }

Overv commented 4 years ago

Your code doesn't handle the case where there are multiple vertices with the same position but with different texture coordinates.

mj-saunders commented 4 years ago

I was just beginning to wonder at this issue.

tl;dr Is the approach still valid though? It does appear to work at least. Though another downside is that it stores duplicates.

I have a feeling this can be solved reasonably without the hash table; I might be wrong - and even if I'm right, maybe it won't be worth it?

I'm not sure, but I think the way the indices are parsed by tinyobjloader (or maybe stored by the .obj) means that even if there are multiple vertices with the same value, they will have a different index.vertex_index, which I think means they'll store their own tex coord. I could be wrong. Maybe it depends on how the file is originally written...

There's something else I'm missing though.

Your original code (and my first attempt at custom hash table, etc - using the same method) both result in 3566 vertices (out of the 4675 in the file).

The code above results in 4675 vertices. Does that mean that I'm correct in my assumption that the indices are all unique?

The file has 391 duplicate vertices and 126 duplicate texture coordinates; so I'm not sure how there ends up being only 3566 unique vertices.

...I might well be just confusing myself now :)

lewa-j commented 4 years ago

Another way is to use better format without need for postprocessing in runtime. Like gltf

mj-saunders commented 4 years ago

Yes, fair point :)

mj-saunders commented 4 years ago

@Overv I just did some quick checks. Printed out all indice : .vertex_index and .texcoord_index to separate files; and a list of the combinations as well.

Then ran this cat index.log | sort -n | uniq | wc -l and get 4675. Incidentally this is the same for the separated vertices and texcoords.

So it looks like the code works, at least for this .obj.

However, as mentioned before, it misses duplicates where the indexes are different but the values are the same.

Any ideas why the .obj file would be representing the same faces more than once?

So yes, looks like hash table is the best way to deal with actual duplicate values.

Thanks for entertaining my weird adventure.

mj-saunders commented 4 years ago

Ah, I suppose these are cases where two separate faces touch, therefore share same vertices and texcoords, and for whatever reason were saved in the file as separate vertices/texcoords.

Overv commented 4 years ago

Well, that, and there are vertices that are erroneously deduplicated because the vertex coordinates match but texture coordinates don't.

Overv / VulkanTutorial

Avoiding the vertex hash table #195