elrnv / vtkio

Visualization ToolKit (VTK) file parser and writer
Apache License 2.0
55 stars 12 forks source link

Change `ScalarType` to `U8` for colors in lookup tables stored in binary files #47

Closed Siimeloni closed 1 week ago

Siimeloni commented 1 month ago

When trying to parse a binary .vtk file the program had an error in the Vtk::parse_legacy_be() function.

2024-08-15T11:56:05.184570Z ERROR GST convert failed error=
   0: Parse error: Complete

The file has following structure:

# vtk DataFile Version 3.0
vtk output
BINARY
DATASET UNSTRUCTURED_GRID
POINTS 2804 float
---data---

CELL_TYPES 3633
--- data---

CELL_DATA 3633

SCALARS colors float 1
LOOKUP_TABLE color_table
---data---

LOOKUP_TABLE color_table 2
---data---
contains (in binary, encoded as u8):
1 0 0 1
1 1 1 1
-----------

The error happens in the attribute_lookup_table() function.

It was not possible to parse this file because vtkio was expecting the color values (r, g, b, a) to be encoded in f32 binary and not in u8. Since we are able to write these files on our own we tested output with the color values written as f32, but on import ParaView would throw an error. This does not happen with the u8 color values, so we assume that u8 is the correct encoding here.

When changing the appropriate location in vtkio we are able to successfully parse these files.

elrnv commented 1 month ago

This looks like a bug, nice find and thank you for the PR! The correct behavior is in the spec and is thankfully consistent with your comment:

The definition of color scalars (i.e., unsigned char values directly mapped to color) varies depending upon the number of values (nValues) per scalar. If the file format is ASCII, the color scalars are defined using nValues float values between (0,1). If the file format is BINARY, the stream of data consists of nValues unsigned char values per scalar value.

We will need to parse as float for ASCII and u8 for binary. Could you update the PR to do this and add a test? I would be happy to accept.

MrMuetze commented 4 weeks ago

Heyo 👋 I've been looking into this issue together with @Siimeloni and we've found a couple of things that probably make this fix a bit more involved. We wanted to summarize our findings so far to maybe get some input from @elrnv and to just have something to come back to in the future.

Changing the parser part is only half of the required fix here as we also need to adjust the writer side. We have mostly looked at the cube_complex_test() since it covers binary and ASCII for parsing and writing. For the parser, the fix can be done as described by @elrnv and correctly parses F32 for ASCII input and U8 for binary input.

Here we run into the first issue. Since the cube_complex_test() tests the writer and the parser (and the writer hasn't been adjusted so far), the test fails because for binary output, the writer still writes F32 values. Thus the parser does not encounter U8 which leads to a test failure.

Then I started to look into the writer code more deeply. I guess that, up until this point, the assumptions under which the VTK model and the parser/writer infrastructure haven been devised include that all attributes for a given dataset always remain in the same type independent of the input/output format. The writer functions are relatively rigid in regards to the output type. They just write whatever buffer they have been fed. For this specific lookup table we would have to decide whether to write F32 or U8 based on the output type. This is something that cannot be easily done right away (if I haven't overlooked anything).

Since I wanted to try something anyway, I hacked something into the writer code (adjusting https://github.com/elrnv/vtkio/blob/02d91009d2ca9683b16b15cf74d0bb7375444334/src/writer.rs#L881-L883) to convert F32 input to U8 if the length would match the lookup table data from the test. This leads to the writer/parser combination working nicely, but of course that is not a practical solution. Additionally, the test still would not work because a difference in the VTK Model (test expects F32, but now we parse a U8 buffer).

This might be another issue that came up. With the changes described above, the VTK model might contain an IOBuffer::F32 or an IOBuffer::U8 for the lookup table, while still representing the same kind of data. So based on input/output types, we might have to do one of the following conversions when writing the model again:

F32 -> F32 F32 -> U8 U8 -> F32 U8 -> U8

I'm not sure how practical it is that the VTK model is directly dependent on the input data type at this point. Maybe it would make sense to alway convert this lookup table data to either F32 or U8? Another question is if we should adjust all writer functions to be more flexible in terms of their output or if it would be more sensible to add new functions that handle this specific lookup table case? This is the point where we would like to get some feedback. Maybe we have overlooked something up to this point or are not seeing the best solution. We would love to hear your thoughts on this!

Best regards

elrnv commented 3 weeks ago

Good point. I added an writer implementation here: https://github.com/GiGainfosystems/vtkio/pull/1 . Please review to see if I missed anything.

MrMuetze commented 2 weeks ago

I've incorporated the remaining changes discussed on the other PR and I think this should be ready now. Thanks again for your input and work on this!

elrnv commented 1 week ago

Just one comment above. Otherwise ready to merge. Thank you, @MrMuetze, for adding these changes here!

MrMuetze commented 1 week ago

Cool, Im not sure what comment you are referring to though. 😅 Am I not seeing something on my end?

elrnv commented 1 week ago

sorry about that, I forgot to submit the review! Do you see the comment now?