GenericMappingTools / pygmt

A Python interface for the Generic Mapping Tools.
https://www.pygmt.org
BSD 3-Clause "New" or "Revised" License
749 stars 217 forks source link

Correctly reserve the grid data dtype by converting ctypes array to numpy array with np.ctypeslib.as_array #3446

Closed seisman closed 6 days ago

seisman commented 1 week ago

Description of proposed changes

There are two different ways to convert a ctypes array to a numpy array with a specific shape:

  1. np.reshape(ctypes_array[:120], (10, 12))
  2. np.ctypeslib.as_array(ctypes_array, (10, 12))

Currently we use option 1 but option 2 is better. In this PR, we use option 2.

As shown in the benchmarks below. Option 2 has at least two pros:

  1. 10 times faster than option 1
  2. The dtype information is kept.
In [1]: import ctypes

In [2]: import numpy as np

In [3]: ctypes_array = (ctypes.c_uint8 * 120)(*range(120))

In [5]: array1 = np.reshape(ctypes_array[:120], (10, 12))

In [6]: array1
Out[6]:
array([[  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11],
       [ 12,  13,  14,  15,  16,  17,  18,  19,  20,  21,  22,  23],
       [ 24,  25,  26,  27,  28,  29,  30,  31,  32,  33,  34,  35],
       [ 36,  37,  38,  39,  40,  41,  42,  43,  44,  45,  46,  47],
       [ 48,  49,  50,  51,  52,  53,  54,  55,  56,  57,  58,  59],
       [ 60,  61,  62,  63,  64,  65,  66,  67,  68,  69,  70,  71],
       [ 72,  73,  74,  75,  76,  77,  78,  79,  80,  81,  82,  83],
       [ 84,  85,  86,  87,  88,  89,  90,  91,  92,  93,  94,  95],
       [ 96,  97,  98,  99, 100, 101, 102, 103, 104, 105, 106, 107],
       [108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119]])

In [7]: array1.dtype
Out[7]: dtype('int64')

In [8]: array2 = np.ctypeslib.as_array(ctypes_array, (10, 12))

In [9]: array2
Out[9]:
array([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,
        13,  14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,
        26,  27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,
        39,  40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,
        52,  53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,  64,
        65,  66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,
        78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,  90,
        91,  92,  93,  94,  95,  96,  97,  98,  99, 100, 101, 102, 103,
       104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116,
       117, 118, 119], dtype=uint8)

In [10]: array2.dtype
Out[10]: dtype('uint8')

In [11]: %timeit array1 = np.reshape(ctypes_array[:120], (10, 12))
13.1 μs ± 336 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [12]: %timeit array2 = np.ctypeslib.as_array(ctypes_array, (10, 12))
778 ns ± 117 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

Also need to be cautious that we need to make a copy of the ctypes/numpy array, since the memory of the array is allocated by GMT and will be freed when exiting the session.

codspeed-hq[bot] commented 1 week ago

CodSpeed Performance Report

Merging #3446 will improve performances by 82.61%

Comparing ctypesarray (09ac0a0) with main (2454aa6)

Summary

⚡ 2 improvements ✅ 99 untouched benchmarks

Benchmarks breakdown

Benchmark main ctypesarray Change
test_sph2grd_no_outgrid 323.6 ms 289.1 ms +11.95%
test_sphinterpolate_no_outgrid 71.2 ms 39 ms +82.61%
seisman commented 1 week ago

We can also declare it as a bug, since the data dtype was not preserved in the old version.