brendanarnold / py-fortranformat

MIT License
31 stars 7 forks source link

Performance improvements #31

Closed ZedThree closed 1 year ago

ZedThree commented 1 year ago

This has been a really useful package for me, but the performance is really not great. On the test case below, this PR gave me a x16 speedup, mostly from using StringIO as the buffer for writing floats. I've also done some refactoring to avoid isinstance calls, which can be expensive.

from fortranformat import FortranRecordWriter, FortranRecordReader

import numpy as np

def test_read_write_array():
    nx = 400
    ny = 100
    array = np.linspace(0, 1, nx * ny).reshape((nx, ny))
    writer = FortranRecordWriter("(5f17.9)")
    text = writer.write(array.flatten("F"))

    reader = FortranRecordReader("(5f17.9)")
    for line in text.splitlines():
        new_array = reader.read(line)

if __name__ == "__main__":
    test_read_write_array()

Using str as before, this test case scales like O(n^2), with this PR it scales like O(n).

I'll also look at using StringIO for the other types.

ZedThree commented 1 year ago

For this simple test case, it's now as fast as an implementation using basic f-strings for output.

I've tweaked the input as well, and got some speedup there too, about 10-20%.

I think there's at least a further 10-20% saving that could be had from moving the _compose_*_string and read_* functions into methods of the edit descriptors themselves. This would enabling skipping the isinstance checks in input/output -- these can be surprisingly expensive, each one can be 2% of the total run time.

brendanarnold commented 1 year ago

Some really interesting changes here, performance is a known problem with the library so really keen to speed things up a bit. I’m away at the moment but will be able to take a proper look in the next few days 👀

ZedThree commented 1 year ago

Yep, I've run the tests on each commit. The minimaltests set all works fine. I've not run the full test suite, just gathering the tests took a long time! If you like, I can get the tests running automatically on Github.

I can make the changes to be compatible with Python 2.7, but I strongly recommend dropping support for Python 2.7, and very much recommend only supporting Python >= 3.8. Python 2.7 has been end-of-life since 2020, and the vast majority of the scientific Python ecosystem now only supports 3. The major packages like numpy and scipy now also require 3.8+.