boutproject / xBOUT

Collects BOUT++ data from parallelized simulations into xarray.
https://xbout.readthedocs.io/en/latest/
Apache License 2.0
21 stars 10 forks source link

Convert byte-strings to str when loading data #229

Closed johnomotani closed 2 years ago

johnomotani commented 2 years ago

char-arrays are loaded in Python as byte-strings (of type bytes). These cannot be saved to NetCDF by xarray as attributes. For consistency between saved and re-loaded Datasets, convert all byte-strings in metadata to utf-8 strings when loading data (if we converted only when saving data then the Dataset being saved could have bytes metadata variables, while the re-loaded Dataset would have str).

Also adds a byte-string "run_id" variable to the test Datasets, which should prevent regressions of this bug-fix.

codecov-commenter commented 2 years ago

Codecov Report

Merging #229 (21596be) into master (d9fb747) will decrease coverage by 0.06%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #229      +/-   ##
==========================================
- Coverage   75.68%   75.62%   -0.07%     
==========================================
  Files          15       15              
  Lines        2682     2683       +1     
  Branches      631      635       +4     
==========================================
- Hits         2030     2029       -1     
- Misses        420      421       +1     
- Partials      232      233       +1     
Impacted Files Coverage Δ
xbout/utils.py 83.08% <100.00%> (+0.05%) :arrow_up:
xbout/plotting/animate.py 45.21% <0.00%> (-0.87%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update d9fb747...21596be. Read the comment docs.

johnomotani commented 2 years ago

In latest version of xarray, this problem is fixed (tested with 0.20.2) - byte-strings are written as char arrays. The dimension name is different from the one BOUT++ gives (e.g. string36 (xarray) for a length-36 char array instead of char36 (BOUT++)), not sure if this is important... If it turns out to be important, the dimension name can be modified, see https://xarray.pydata.org/en/stable/user-guide/io.html#string-encoding.

Closing as no longer necessary - if anyone hits the original bug, the fix is to update xarray.