Open yvonnefroehlich opened 1 year ago
I can reproduce the bug on Linux.
If pd.DataFrame
contain integer-only columns, the debug messages are:
plot [DEBUG]: Look for file -5/5/-5/5 in /home/seisman/.gmt
plot [DEBUG]: Look for file -5/5/-5/5 in /home/seisman/.gmt/cache
plot [DEBUG]: Look for file -5/5/-5/5 in /home/seisman/.gmt/server
plot [DEBUG]: Got regular w/e/s/n for region (-5/5/-5/5)
plot [INFORMATION]: Processing input table data
plot [DEBUG]: Operation will require 2 input columns [n_cols_start = 2]
plot [DEBUG]: Reset MAP_ANNOT_OBLIQUE to anywhere
plot [DEBUG]: Projected values in meters: -5 5 -5 5
plot [DEBUG]: Computed automatic parameters using dimension scaling: 0.9
plot [INFORMATION]: Map scale is 0.001 km per cm or 1:100.
plot [DEBUG]: Running in PS mode modern
plot [DEBUG]: Use PS filename /home/seisman/.gmt/sessions/gmt_session.1478556/gmt_1.ps-
plot [DEBUG]: Append to hidden PS file /home/seisman/.gmt/sessions/gmt_session.1478556/gmt_1.ps-
plot [DEBUG]: Got session name as pygmt-session and default graphics formats as pdf
plot [DEBUG]: Basemap order: Frame = above Grid = below Tick/Annot = below
plot [DEBUG]: gmtapi_init_import: Passed family = Data Table and geometry = Line
plot [DEBUG]: gmtapi_init_import: Added 1 new sources
plot [DEBUG]: GMT_Init_IO: Returned first Input object ID = 0
plot [DEBUG]: gmtapi_begin_io: Input resource access is now enabled [container]
plot [DEBUG]: gmtapi_import_dataset: Passed ID = -1 and mode = 0
plot [INFORMATION]: Duplicating data table from user 4 column arrays of length 4
plot [DEBUG]: Object ID 1 : Registered Data Table Memory Copy 560d93e51980 as an Input resource with geometry Point [n_objects = 2]
plot [DEBUG]: gmtapi_import_dataset processed 1 resources
plot [DEBUG]: GMT_End_IO: Input resource access is now disabled
plot [INFORMATION]: Plotting segment 0
plot [DEBUG]: GMT_Destroy_Data: freed memory for a Data Table for object 1
plot [DEBUG]: gmtlib_unregister_io: Unregistering object no 1 [n_objects = 1]
plot [DEBUG]: gmtlib_unregister_io: Object no 1 has non-NULL resource pointer
plot [DEBUG]: Current size of half-baked PS file /home/seisman/.gmt/sessions/gmt_session.1478556/gmt_1.ps- = 23633.
If pd.DataFrame
contain float-type columns, the debug messages are:
plot [DEBUG]: Look for file -5/5/-5/5 in /home/seisman/.gmt
plot [DEBUG]: Look for file -5/5/-5/5 in /home/seisman/.gmt/cache
plot [DEBUG]: Look for file -5/5/-5/5 in /home/seisman/.gmt/server
plot [DEBUG]: Got regular w/e/s/n for region (-5/5/-5/5)
plot [INFORMATION]: Processing input table data
plot [DEBUG]: Operation will require 2 input columns [n_cols_start = 2]
plot [DEBUG]: Reset MAP_ANNOT_OBLIQUE to anywhere
plot [DEBUG]: Projected values in meters: -5 5 -5 5
plot [DEBUG]: Computed automatic parameters using dimension scaling: 0.9
plot [INFORMATION]: Map scale is 0.001 km per cm or 1:100.
plot [DEBUG]: Running in PS mode modern
plot [DEBUG]: Use PS filename /home/seisman/.gmt/sessions/gmt_session.1479589/gmt_1.ps-
plot [DEBUG]: Append to hidden PS file /home/seisman/.gmt/sessions/gmt_session.1479589/gmt_1.ps-
plot [DEBUG]: Got session name as pygmt-session and default graphics formats as pdf
plot [DEBUG]: Basemap order: Frame = above Grid = below Tick/Annot = below
plot [DEBUG]: gmtapi_init_import: Passed family = Data Table and geometry = Line
plot [DEBUG]: gmtapi_init_import: Added 1 new sources
plot [DEBUG]: GMT_Init_IO: Returned first Input object ID = 0
plot [DEBUG]: gmtapi_begin_io: Input resource access is now enabled [container]
plot [DEBUG]: gmtapi_import_dataset: Passed ID = -1 and mode = 0
plot [INFORMATION]: Referencing data table from user 4 column arrays of length 4
plot [DEBUG]: Object ID 1 : Registered Data Table Memory Reference 55c909f971a0 as an Input resource with geometry Point [n_objects = 2]
plot [DEBUG]: gmtapi_import_dataset processed 1 resources
plot [DEBUG]: GMT_End_IO: Input resource access is now disabled
plot [INFORMATION]: Plotting segment 0
free(): invalid next size (fast)
Here is the diff:
< plot [INFORMATION]: Duplicating data table from user 4 column arrays of length 4
< plot [DEBUG]: Object ID 1 : Registered Data Table Memory Copy 560d93e51980 as an Input resource with geometry Point [n_objects = 2]
---
> plot [INFORMATION]: Referencing data table from user 4 column arrays of length 4
> plot [DEBUG]: Object ID 1 : Registered Data Table Memory Reference 55c909f971a0 as an Input resource with geometry Point [n_objects = 2]
26,29c26
< plot [DEBUG]: GMT_Destroy_Data: freed memory for a Data Table for object 1
< plot [DEBUG]: gmtlib_unregister_io: Unregistering object no 1 [n_objects = 1]
< plot [DEBUG]: gmtlib_unregister_io: Object no 1 has non-NULL resource pointer
< plot [DEBUG]: Current size of half-baked PS file /home/seisman/.gmt/sessions/gmt_session.1478556/gmt_1.ps- = 23633.
---
> free(): invalid next size (fast)
So, for the integer-type case, data is duplicated, but for the float-type case, data is used by reference.
@PaulWessel Need your help.
How is the DataFrame passed to GMT? Via matrix? Also, this looks like a bad sign
gmtapi_import_dataset: Passed ID = -1 and mode = 0
since ID = -1 means "not set", so that can't be good.
It's passed via GMT_Put_Vectors
.
Here are the values used in GMT_Open_Virtualfile
family="GMT_IS_DATASET|GMT_VIA_VECTOR"
geometry="GMT_IS_POINT"
direction="GMT_IN|GMT_IS_REFERENCE"
Not clear. Might you share a minimal example that (1) loads the data frame, (2) passes it to some simple module like gmtconvert (assuming that also crashes)? Think I need to debug.
Might you share a minimal example that (1) loads the data frame, (2) passes it to some simple module like gmtconvert (assuming that also crashes)?
Tried to pass the same dataset to gmtconvert
, but it doesn't crash.
import pandas as pd
from pygmt.clib import Session
test_dict_int = {
'a': [ 2, 2, 2, 2],
'z': [ 8, 6, 7, 3],
'x': [-3, -1, 1, 3],
'y': [ 2, 2, 2, 2],
}
data = pd.DataFrame(data=test_dict_int)
with Session() as lib:
with lib.virtualfile_from_data(data=data) as vintbl:
lib.call_module("convert", f"{vintbl} -Vd")
The verbose messages are:
mtconvert [INFORMATION]: Processing input table data
gmtconvert [DEBUG]: gmtapi_init_import: Passed family = Data Table and geometry = Point
gmtconvert [DEBUG]: gmtapi_init_import: Added 1 new sources
gmtconvert [DEBUG]: GMT_Init_IO: Returned first Input object ID = 0
gmtconvert [DEBUG]: gmtapi_begin_io: Input resource access is now enabled [container]
gmtconvert [DEBUG]: gmtapi_import_dataset: Passed ID = -1 and mode = 0
gmtconvert [INFORMATION]: Referencing data table from user 4 column arrays of length 4
gmtconvert [DEBUG]: Object ID 1 : Registered Data Table Memory Reference 555a4fa95820 as an Input resource with geometry Point [n_objects = 2]
gmtconvert [DEBUG]: gmtapi_import_dataset processed 1 resources
gmtconvert [DEBUG]: GMT_End_IO: Input resource access is now disabled
gmtconvert [DEBUG]: Object ID 2 : Registered Data Table Memory Reference 555a4fad6fe0 as an Input resource with geometry Point [n_objects = 3]
gmtconvert [DEBUG]: Successfully duplicated a Data Table
gmtconvert [DEBUG]: Object ID 3 : Registered Data Table Stream 7f3cdcaae780 as an Output resource with geometry Point [n_objects = 4]
gmtconvert [DEBUG]: gmtapi_begin_io: Output resource access is now enabled [container]
gmtconvert [DEBUG]: gmtapi_export_dataset: Passed ID = 3 and mode = 0
gmtconvert [INFORMATION]: Write Data Table to <stdout>
2 8 -3 2
2 6 -1 2
2 7 1 2
2 3 3 2
gmtconvert [DEBUG]: GMT_End_IO: Output resource access is now disabled
gmtconvert [INFORMATION]: 1 tables concatenated, 4 records passed (input cols = 4; output cols = 4)
gmtconvert [DEBUG]: gmtlib_garbage_collection: Destroying object: C=0 A=0 ID=1 W=Input F=Data Table M=Memory Reference S=Used P=555a4fa95820 N=(null)
gmtconvert [DEBUG]: gmtlib_garbage_collection: Destroying object: C=0 A=0 ID=2 W=Input F=Data Table M=Memory Reference S=Unused P=555a4fad6fe0 N=(null)
gmtconvert [DEBUG]: GMTAPI_Garbage_Collection freed 2 memory objects
gmtconvert [DEBUG]: gmtlib_unregister_io: Unregistering object no 1 [n_objects = 3]
gmtconvert [DEBUG]: gmtlib_unregister_io: Unregistering object no 2 [n_objects = 2]
gmtconvert [DEBUG]: gmtlib_unregister_io: Unregistering object no 3 [n_objects = 1]
ID = -1
so it's not the real problem.
For the example in https://github.com/GenericMappingTools/pygmt/issues/2637#issue-1858869645, if I add style="c0.2c"
(i.e., -Sc0.2c
), the script works. So, it's likely it only crashes when plotting lines.
If I try this:
cat <<- EOF > bug.py
# Set up random test data
import pandas as pd
import pygmt
size = 5
test_dict_int = {
'a': [ 2, 2, 2, 2],
'z': [ 8, 6, 7, 3],
'x': [-3, -1, 1, 3],
'y': [ 2, 2, 2, 2],
}
test_df_int = pd.DataFrame(data=test_dict_int)
fig = pygmt.Figure()
fig.basemap(
region=[-size, size, -size, size],
projection="X" + str(size*2),
frame=True,
)
fig.plot(
# data=test_df_int, # integers -> WORKs
data=test_df_int.astype(float), # floats -> FAILs
incols=[2, 3],
# verbose="d",
)
fig.show()
fig.savefig(fname="bug_MWE.png")
EOF
and run
python bug.py
I get no errors and this plot
What am I missing?
@yvonnefroehlich said "For me, this issue occurs under Windows but not under Linux.". Now I can reproduce the issue under Linux, but you can't reproduce it under macOS.
Need to find out why the behavior is system-dependent.
Need a Linux or Win (@joa-quim ) person to run in debug and determine WTF is going on. I cannot.
I could try to start my python learning through a debug session but for that I would need that PyGMT was able to find my gmt.dll
(which, ofc, has to be a debug build) as I don't want to mess with Conda and environments stuff.
I would need that PyGMT was able to find my
gmt.dll
(which, ofc, has to be a debug build)
Just set the GMT_LIBRARY_PATH
environment variable to the path to the gmt.dll
(something like C:\Users\USERNAME\Mambaforge\envs\pygmt\Library\bin\
).
OK, but when I said gmt.dll
I was being generic. The true name is gmt_w64.dll
and if only the path is set via GMT_LIBRARY_PATH
then the right dll wont be found.
OK, but when I said
gmt.dll
I was being generic. The true name isgmt_w64.dll
and if only the path is set viaGMT_LIBRARY_PATH
then the right dll wont be found.
PyGMT will try to find gmt.dll, gmt_w32.dll and gmt_w64.dll
Good, thanks.
Tried to debug this issue. It seems plot
crashes when trying to free the GMT_DATASET
object
https://github.com/GenericMappingTools/gmt/blob/7825ff4632c85ef6569acf19192068b977127e07/src/psxy.c#L3008
if (GMT_Destroy_Data (API, &D) != GMT_NOERROR) {
Return (API->error);
}
Actually it crashes in the gmt_free_segment
function (https://github.com/GenericMappingTools/gmt/blob/7825ff4632c85ef6569acf19192068b977127e07/src/gmt_io.c#L8875):
SH = gmt_get_DS_hidden (segment);
for (col = 0; col < segment->n_columns; col++) {
if (SH->alloc_mode[col] == GMT_ALLOC_INTERNALLY) /* Free data GMT allocated */
gmt_M_free (GMT, segment->data[col]);
}
gmt_M_free (GMT, segment->data); # CRASHES HERE!
SH->alloc_mode[col]
are GMT_ALLOC_EXTERNALLY
for all the four columns, so segment->data[col]
are not freed, but it crashes when freeing segment->data
.
Ping @PaulWessel Does the above debugging help?
Yes, I got the same. Will debug again to see exactly that data is internally allocated. Not sure why that would crash, but it does.
Since I cannot reproduce it (works for macOS) and I cannot see why this would depend on the OS I cannot really help. Someone would need to debug in Linux but not sure what to look fore. segment->data is allocated in GMT so fair game to free as long as we dont free the read-only vectors.
Description of the problem
Under specific circumstances,
Figure.plot
does not work if apandas.DataFrame
is passed to thedata
parameter and a column order is selected viaincols
. The issue does not occur in case thepd.DataFrame
contains only integers. If the desired columns are passed directly to thex
andy
parameters, the code works well. For me, this issue occurs under Windows but not under Linux.For context, see PR #2515 up on comment https://github.com/GenericMappingTools/pygmt/pull/2515#issuecomment-1685121627
Maybe related to the issues in
2138 up on comment https://github.com/GenericMappingTools/pygmt/pull/2138#issuecomment-1358915643 with fix in #2274
1313 and #1440
Minimal Complete Verifiable Example
Output of
verbose="d"
Full error message
System information