IEAWindTask37 / windIO

Apache License 2.0
18 stars 11 forks source link

Switching from PyYAML to ruamel.yaml #47

Open kenloen opened 3 months ago

kenloen commented 3 months ago

The current implementation is using PyYAML, which is is no longer maintained (although it has recently had some activity, but it still seems that it do not have a development group). The PyYAML implementation is wrapping a C implementation and hence it can lead to issues when installing, which I have seen for some Windows users. I would suggest switching to ruamel.yaml instead. It is a pure python version and hence do not suffer from the same installation issue. As an added benefit it can also do round-trip conversion which means it will preserve comments and white space. It can also use the PyYAML parser that is faster by changing the parser type (e.g. typ="safe").

As the implementation aims to make a default parser I would suggest the following for the yaml settings for the yaml object:

from ruamel.yaml import YAML
yaml_obj = YAML()

yaml_obj.default_flow_style = False
yaml_obj.width = 1e6
yaml_obj.indent(mapping=4, sequence=6, offset=3)
yaml_obj.allow_unicode = False

# Convert numpy types to build in data types
yaml_obj.Representer.add_representer(
    np.str_, lambda dumper, data: dumper.represent_str(str(data))
)
yaml_obj.Representer.add_representer(
    np.float64, lambda dumper, data: dumper.represent_float(float(data))
)
yaml_obj.Representer.add_representer(
    np.float32, lambda dumper, data: dumper.represent_float(float(data))
)

# Write 1D list with flow-style
def list_rep(dumper, data):
    if isinstance(data[0], (list, dict)):
        return dumper.represent_sequence(
            "tag:yaml.org,2002:seq", data, flow_style=False
        )
    return dumper.represent_sequence(
        "tag:yaml.org,2002:seq", data, flow_style=True
    )
yaml_obj.Representer.add_representer(list, list_rep)

# Convert numpy arrays to lists
def ndarray_rep(dumper, data):
    return list_rep(dumper, data.tolist())
yaml_obj.Representer.add_representer(np.ndarray, ndarray_rep)
gbarter commented 3 months ago

Love all of the ideas you have been posting here, @kenloen . For this one and others like it, I would suggest just moving straight to a PR as I don't imagine anyone would object.

kenloen commented 3 months ago

Great to hear the you appreciate my effort here.

I know that some of the issues that I have posted might seem small, and that creating these issues might take the same amount of time as just creating a PR and doing it my self. But as always with code development, the time that one expect it will take is always much more than what it ends up taking. Furthermore, the purpose of all these issues is to map all the things that should be added/changed so that we can priotize which things are more urgent.

I am sorry if this comes across as me trying to not do it my self. My aim was to first map some of the issues and I hope that would motivate others to also add issues for some of the things they would like to add or change.