Get_data major dimension differences in scalar and vector

Matthew-Whisenant commented 1 year ago

In both FormatBinary and FormatAppended versions of get_data, we have:

https://github.com/JuliaVTK/ReadVTK.jl/blob/85db2535c3da7b0c3fffe2f5a2764011671d845c/src/ReadVTK.jl#L681-L687

Is there a reason for the type instability here? Such as Vector type for scalar fields and Matrix type for vector fields.

My workflow includes getting data from multiple field keys at the same time, such as fluid velocity and fluid pressure. But its harder to hcat/vcat or postprocess when one get_data returns a 3xN Matrix and one is a Nx1 Vector where the major dimension is different.

Seems to me it would make more sense for it to return Nx3 Matrix and Nx1 Vector or return 3xN Matrix and 1xN Matrix. The former would preserve current typing and make the output Matrix column-major just like Julia is.

Again I'm not too concerned about the typing, moreso that to cat the arrays I have to process vectors and scalar fields differently because of the major dimension mismatch.

I can do a PR of this if you all don't forsee this breaking anything in your Trixi2Vtk.jl package. Or if its more of a unnecessary or tedious change it can just be on my local ReadVTK. Just let me know.

Matthew-Whisenant commented 1 year ago

As a follow-up, benchmarking did not see a significant performance increase.

New method

 if N == 1 
   # If it is a one dimensional array, just return it as is 
   return v 
 else 
   # Otherwise, reshape into two-dimensional array 
   return reshape(v, :, N) 
 end

Test

using ReadVTK

using BenchmarkTools

file = VTKFile("celldata_inline_binary_uncompressed.vtu")

@benchmark get_points($file)

Current method

BenchmarkTools.Trial: 10000 samples with 1 evaluation. Range (min … max): 112.900 μs … 4.918 ms ┊ GC (min … max): 0.00% … 97.19% Time (median): 123.700 μs ┊ GC (median): 0.00% Time (mean ± σ): 147.404 μs ± 117.758 μs ┊ GC (mean ± σ): 6.24% ± 8.98%

▇█▆▅▅▄▃▂▁▁▁▃▃▃▂▁▁▁▁ ▂ ████████████████████▇▇▆▇▆▅▅▄▅▄▃▃▃▁▃▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▄▇▆ █ 113 μs Histogram: log(frequency) by time 577 μs <

Memory estimate: 581.57 KiB, allocs estimate: 2341.

New method

BenchmarkTools.Trial: 10000 samples with 1 evaluation. Range (min … max): 114.200 μs … 2.912 ms ┊ GC (min … max): 0.00% … 94.71% Time (median): 123.600 μs ┊ GC (median): 0.00% Time (mean ± σ): 137.880 μs ± 76.972 μs ┊ GC (mean ± σ): 5.86% ± 9.25%

█▇▅▅▄▂▁ ▂ ███████▇▇█▇▇▆▆▅▆▃▃▃▄▁▃▃▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▆▇ █ 114 μs Histogram: log(frequency) by time 685 μs <

Memory estimate: 581.57 KiB, allocs estimate: 2341.

Matthew-Whisenant commented 1 year ago

Actually, I just realized that WriteVTK uses this ordering as well. So definitely not something to be done right now. Plus ReadVTK uses views so the column-row ordering doesn't matter. Maybe the column-major thing is worth considering if a major rewrite for both packages or they are combined at some point. I'll go ahead and close the issue since it's not going to be changed any time soon.

sloede commented 1 year ago

I would be open to this change should this be revisited in the future.

JuliaVTK / ReadVTK.jl