erlang / otp

Erlang/OTP
http://erlang.org
Apache License 2.0
11.39k stars 2.96k forks source link

crash: size_object: bad tag for 0x80 #5049

Closed RoadRunnr closed 2 years ago

RoadRunnr commented 3 years ago

Describe the bug

Since upgrading to OTP-24 (at least since 24.0.2), I get random crashes with core dumps with an error message of:

size_object: bad tag for 0x80

I managed to grab a core dump with OTP 24.0.0.3 and gives me this backtrace:

(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
#1  0x00007f47ce7d0864 in __GI_abort () at abort.c:79
#2  0x00005624c82d5f0a in erts_exit_epilogue () at beam/erl_init.c:2534
#3  0x00005624c83a7f1a in erts_exit_vv (n=-2, flush_async=flush_async@entry=0, fmt=fmt@entry=0x5624c861157f "size_object: bad tag for %#x\n", args1=args1@entry=0x7f478a5a78a8, args2=args2@entry=0x7f478a5a78c0) at beam/erl_init.c:2520
#4  0x00005624c83a8087 in erts_exit (n=n@entry=-2, fmt=fmt@entry=0x5624c861157f "size_object: bad tag for %#x\n") at beam/erl_init.c:2544
#5  0x00005624c83de72c in size_object_x (obj=<optimized out>, obj@entry=139942558691730, litopt=litopt@entry=0x7f478a5a7b10) at beam/copy.c:240
#6  0x00005624c841a0dc in erts_send_message (sender=sender@entry=0x5624c9196598, receiver=receiver@entry=0x5624c91948d0, receiver_locks=receiver_locks@entry=0x7f478a5a7b84, message=message@entry=139942558691730) at beam/erl_message.c:779
#7  0x00005624c83f5236 in do_send (p=p@entry=0x5624c9196598, to=to@entry=380465383126035, msg=msg@entry=139942558691730, return_term=return_term@entry=29899, refp=refp@entry=0x7f478a5a7be8, dist_ctx=dist_ctx@entry=0x7f478a5a7bf0, connect=0, suspend=1) at beam/bif.c:2416
#8  0x00005624c83f954f in send_3 (A__p=0x5624c9196598, BIF__ARGS=<optimized out>, A__I=<optimized out>) at beam/bif.c:2464
#9  0x00007f478b75b371 in ?? ()
#10 0x0000000000000000 in ?? ()

To Reproduce

unknown, random failure in complex common test suite of https://github.com/travelping/ergw

Affected versions

observed on OTP 24.0.2 and 24.0.3, might have happened on older versions as well

Other Information

OS: Ubuntu 21.10 (Impish) OTP build with kerl gcc version 10.3.0 (Ubuntu 10.3.0-4ubuntu1)

garazdawi commented 3 years ago

What happens if you do etp 139942558691730 in a gdb session for that core?

Can you provide the core+executable so that we can have a look?

RoadRunnr commented 3 years ago

I've dropped you an email with the link to the core file. It is too large to share here.

The latest run produced this:

(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
#1  0x00007fcf76e76864 in __GI_abort () at abort.c:79
#2  0x000055f968eebf0a in erts_exit_epilogue () at beam/erl_init.c:2534
#3  0x000055f968fbdf1a in erts_exit_vv (n=-2, flush_async=flush_async@entry=0, fmt=fmt@entry=0x55f96922757f "size_object: bad tag for %#x\n", args1=args1@entry=0x7fcf332678a8, args2=args2@entry=0x7fcf332678c0) at beam/erl_init.c:2520
#4  0x000055f968fbe087 in erts_exit (n=n@entry=-2, fmt=fmt@entry=0x55f96922757f "size_object: bad tag for %#x\n") at beam/erl_init.c:2544
#5  0x000055f968ff472c in size_object_x (obj=<optimized out>, obj@entry=140525325261202, litopt=litopt@entry=0x7fcf33267b10) at beam/copy.c:240
#6  0x000055f9690300dc in erts_send_message (sender=sender@entry=0x55f96b2074f0, receiver=receiver@entry=0x55f96b24f258, receiver_locks=receiver_locks@entry=0x7fcf33267b84, message=message@entry=140525325261202) at beam/erl_message.c:779
#7  0x000055f96900b236 in do_send (p=p@entry=0x55f96b2074f0, to=to@entry=223303939757619, msg=msg@entry=140525325261202, return_term=return_term@entry=29899, refp=refp@entry=0x7fcf33267be8, dist_ctx=dist_ctx@entry=0x7fcf33267bf0, connect=0, suspend=1) at beam/bif.c:2416
#8  0x000055f96900f54f in send_3 (A__p=0x55f96b2074f0, BIF__ARGS=<optimized out>, A__I=<optimized out>) at beam/bif.c:2464
#9  0x00007fcf33e03371 in global::call_light_bif_shared ()
(gdb) etp 140525325261202
{'$gen_call',{<0.8132.0>,[alias|#InternalRefError<0x7fce9a18194a>]},{get_result_spec,handle_event,[info,{[[[[[[[[...]|#Cp<0x80>]|#Cp<0x80>]|#Cp<0x80>]|#Cp<0x80>]|#Cp<0x80>]|#Cp<0x80>]|#Cp<0x80>],#Ref<0.3019417144.1735131138.103636>,process,<0.8140.0>,normal},{fsm,run,#{Keys:{
#InternalRefError<0x7fce9a140bd2>} Values:{{#InternalRefError<0x7fce9a140bd2>,[#Fun<0x7fcf301de4e0,0x7fce9a149138,0x1,0x2>,#Fun<0x7fcf301dd5e0,0x7fce9a180578,0x1,0x2>],#Fun<0x7fcf30173680,0x7fce9a147618,0x3,0x2>,#Fun<0x7fcf301736e0,0x7fce9a140d80,0x3,0x2>}}}},#{Keys:{'Session','Version',aaa_o
pts,bearer,context,interface,left_tunnel,mark,node_selection,pcc,pfcp,session_opts,...} Values:{<0.8133.0>,v2,#{Keys:{'AAA-Application-Id','Password','Username'} Values:{ergw_aaa_provider,#{Keys:{default} Values:{#HeapBinary<0x4,0x177677265>}},#{Keys:{default,from_protocol_opts} Values:{
#HeapBinary<0x4,0x7fce77677265>,true}}}},#{Keys:{left,right} Values:{{bearer,'Access',#RefcBinary<0x4,0x7fce9a140d40,0x55f96b04ea70,0x55f96b04ea88,(nil)>,{fq_teid,v4,{upf,left}},{fq_teid,{127,0,0,1},3606244992}},{bearer,'SGi-LAN',#RefcBinary<0x8,0x7fce9a1438c8,0x55f96b00cb58,0x55f96b00cb70,
(nil)>,{ue_ip,{ergw_local_pool,<0.6591.0>,{{10,184,235,132},32},{ipv4,32},#{Keys:{'MS-Primary-DNS-Server','MS-Primary-NBNS-Server','MS-Secondary-DNS-Server','MS-Secondary-NBNS-Server'} Values:{#HeapBinary<0x4,0x7fcf08080808>,#HeapBinary<0x4,0x100007f>,#HeapBinary<0x4,0x3ff0000004040808>,
#HeapBinary<0x4,0x100007f>}}},{ergw_local_pool,<0.6591.0>,{{32769,0,1,41075,0,0,0,...},64},{ipv6,64},#{Keys:{'DNS-Server-IPv6-Address'} Values:{[#HeapBinary<0x10,0x604860480120,0x8888000000000000>,#HeapBinary<0x10,0x604860480120,0x4488000000000000>]}}},undefined},default}}},{context,[
#RefcBinary<0x5,0x7fce9a142950,0x55f96b00d708,0x55f96b00d720,(nil)>,#RefcBinary<0x7,0x7fce9a143968,0x55f96b0128b8,0x55f96b0128d0,(nil)>,#RefcBinary<0x3,0x7fce9a144bc0,0x55f96b0128f8,0x55f96b012910,(nil)>,#RefcBinary<0x6,0x7fce9a1460d8,0x55f96b011e68,0x55f96b011e80,(nil)>,#RefcBinary<0x6,
0x7fce9a146fd8,0x55f96b012010,0x55f96b012028,(nil)>,#RefcBinary<0x4,0x7fce9a148200,0x55f96b012050,0x55f96b012068,(nil)>],#RefcBinary<0xf,0x7fce9a1440d8,0x55f96b011e28,0x55f96b011e40,(nil)>,#RefcBinary<0x10,0x7fce9a1428f0,0x55f96b011c30,0x55f96b011c48,(nil)>,#RefcBinary<0xc,0x7fce9a142920,
0x55f96b011718,0x55f96b011730,(nil)>,{imsi,#RefcBinary<0xf,0x7fce9a1440d8,0x55f96b011e28,0x55f96b011e40,(nil)>,5},4009867669,5,28800000,v2,'IPv4v6',{ue_ip,{ergw_local_pool,<0.6591.0>,{{10,184,235,132},32},{ipv4,32},#{Keys:{'MS-Primary-DNS-Server','MS-Primary-NBNS-Server','MS-Secondary-DNS-Ser
ver','MS-Secondary-NBNS-Server'} Values:{#HeapBinary<0x4,0x7fcf08080808>,#HeapBinary<0x4,0x100007f>,#HeapBinary<0x4,0x3ff0000004040808>,#HeapBinary<0x4,0x100007f>}}},{ergw_local_pool,<0.6591.0>,{{32769,0,1,41075,0,0,0,1},64},{ipv6,64},#{Keys:{'DNS-Server-IPv6-Address'} Values:{[#HeapBinary<
0x10,0x604860480120,0x8888000000000000>,#HeapBinary<0x10,0x604860480120,0x4488000000000000>]}}},undefined},...},pgw_s5s8,{tunnel,'Access',#RefcBinary<0x4,(nil),0x55f96b04e868,0x55f96b04e880,(nil)>,{socket,'remote-irx','gtp-c',<0.6575.0>},<0.7882.0>,v2,{fq_teid,{127,0,200,1},99281702},{fq_teid
,{127,0,100,1},2019101082},2},set,[default],{pcc_ctx,#{Keys:{} Values:{}},#{Keys:{#HeapBinary<0x6,0x313030302d72>} Values:{#{Keys:{'Charging-Rule-Base-Name','Flow-Information','Metering-Method','Offline','Precedence','Rating-Group'} Values:{#HeapBinary<0x7,0x313030306d326d>,[#{Keys:{'Flow-Des
cription','Flow-Direction'} Values:{[#HeapBinary<0x22,0x6f2074696d726570,0x7266207069207475,0x7420796e61206d6f,0x6e6769737361206f,0x6465>],[1]}},#{Keys:{'Flow-Description','Flow-Direction'} Values:{[#HeapBinary<0x22,0x6f2074696d726570,0x7266207069207475,0x7420796e61206d6f,0x6e6769737361206f,
0x7fcf37196465>],[2]}}],[1],[1],"d",[3000]}}}},#{Keys:{} Values:{}},#{Keys:{} Values:{}}},{pfcp_ctx,#HeapBinary<0x32,0x78732e6e6f706f74,0x3130752d7767702e,0x636e6d2e6370652e,0x3063636d2e313030,0x6e707067332e3130,0x6f2e6b726f777465,0x6772>,<0.6580.0>,{up_function_features,1,0,0,1,0,0,0,0,0,0,
...},{seid,16#2e71deec68e82eb1,0},{bearer,'CP-function',#RefcBinary<0x3,0x7fce9a148418,0x55f96b04e7a8,0x55f96b04e7c0,(nil)>,undefined,{fq_teid,{127,0,0,1},556254812}},#{Keys:{far,pdr,teid,urr} Values:{4,4,2,3}},#{Keys:{{far,dp_to_cp_far},{far,{downlink,#HeapBinary<0x6,0x313030302d72>}},{far,{
uplink,#HeapBinary<0x6,0x313030302d72>}},{pdr,ipv6_mcast_pdr},{pdr,{downlink,#HeapBinary<0x6,0x313030302d72>}},{pdr,{uplink,#HeapBinary<0x6,0x313030302d72>}},{teid,left},{urr,{offline,3000}},{urr,{offline,'IP-CAN'}}} Values:{1,2,3,1,2,3,1,2,1}},#{Keys:{1,2} Values:{{offline,'IP-CAN'},{offline
,3000}}},#{Keys:{3000,'IP-CAN'} Values:{[2],[1]}},#{Keys:{1,3} Values:{left,left}},#{Keys:{{far,dp_to_cp_far},{far,{downlink,#HeapBinary<0x6,0x313030302d72>}},{far,{uplink,#HeapBinary<0x6,0x313030302d72>}},{pdr,ipv6_mcast_pdr},{pdr,{downlink,#HeapBinary<0x6,0x313030302d72>}},{pdr,{uplink,
#HeapBinary<0x6,0x313030302d72>}},{urr,{offline,3000}},{urr,{offline,'IP-CAN'}}} Values:{#{Keys:{apply_action,far_id,forwarding_parameters} Values:{{apply_action,0,0,0,0,0,0,1,0,...},{far_id,1},{forwarding_parameters,#{Keys:{destination_interface,network_instance,outer_header_creation}
 Values:{{destination_interface,'CP-function'},{network_instance,#RefcBinary<0x3,0x7fce9a148418,0x55f96b04e7a8,0x55f96b04e7c0,(nil)>},{outer_header_creation,false,false,'GTP-U',556254812,#HeapBinary<0x4,0x100007f>,undefined,...}}}}}},#{Keys:{apply_action,far_id,forwarding_parameters} Values:{
{apply_action,0,0,0,0,0,0,1,0,...},{far_id,2},{forwarding_parameters,#{Keys:{destination_interface,network_instance,outer_header_creation} Values:{{destination_interface,'Access'},{network_instance,#RefcBinary<0x4,0x7fce9a140d40,0x55f96b04ea70,0x55f96b04ea88,(nil)>},{outer_header_creation,fal
se,false,'GTP-U',3606244992,#HeapBinary<0x4,0x100007f>,undefined,...}}}}}},#{Keys:{apply_action,far_id,forwarding_parameters} Values:{{apply_action,0,0,0,0,0,0,1,0,...},{far_id,3},{forwarding_parameters,#{Keys:{destination_interface,network_instance} Values:{{destination_interface,'SGi-LAN'},
{network_instance,#RefcBinary<0x8,0x7fce9a1438c8,0x55f96b00cb58,0x55f96b00cb70,(nil)>}}}}}},#{Keys:{far_id,pdi,pdr_id,precedence} Values:{{far_id,1},{pdi,#{Keys:{f_teid,network_instance,sdf_filter,source_interface} Values:{{f_teid,choose,undefined,choose,1},{network_instance,#RefcBinary<0x4,
0x7fce9a140d40,0x55f96b04ea70,0x55f96b04ea88,(nil)>},{sdf_filter,#HeapBinary<0x27,0x6f2074696d726570,0x7266203835207475,0x3a30306666206d6f,0x61206f7420382f3a,0x64656e67697373>,undefined,undefined,undefined,undefined},{source_interface,'Access'}}}},{pdr_id,1},{precedence,100}}},#{Keys:{far_id,
pdi,pdr_id,precedence,urr_id} Values:{{far_id,2},{pdi,#{Keys:{network_instance,sdf_filter,source_interface,ue_ip_address} Values:{{network_instance,#RefcBinary<0x8,0x7fce9a1438c8,0x55f96b00cb58,0x55f96b00cb70,(nil)>},{sdf_filter,#HeapBinary<0x22,0x6f2074696d726570,0x7266207069207475,
0x7420796e61206d6f,0x6e6769737361206f,0x6465>,undefined,undefined,undefined,undefined},{source_interface,'SGi-LAN'},{ue_ip_address,dst,#HeapBinary<0x4,0x84ebb80a>,#HeapBinary<0x10,0x73a0010000000180,0x100000000000000>,undefined,undefined}}}},{pdr_id,2},{precedence,100},[{urr_id,1},{urr_id,2}]
}},#{Keys:{far_id,outer_header_removal,pdi,pdr_id,precedence,urr_id} Values:{{far_id,3},{outer_header_removal,'GTP-U/UDP/IPv4'},{pdi,#{Keys:{f_teid,network_instance,sdf_filter,source_interface,ue_ip_address} Values:{{f_teid,choose,undefined,choose,1},{network_instance,#RefcBinary<0x4,
0x7fce9a140d40,0x55f96b04ea70,0x55f96b04ea88,(nil)>},{sdf_filter,#HeapBinary<0x22,0x6f2074696d726570,0x7266207069207475,0x7420796e61206d6f,0x6e6769737361206f,0x7fcf37196465>,undefined,undefined,undefined,undefined},{source_interface,'Access'},{ue_ip_address,src,#HeapBinary<0x4,0x84ebb80a>,
#HeapBinary<0x10,0x73a0010000000180,0x100000000000000>,undefined,undefined}}}},{pdr_id,3},{precedence,100},[{urr_id,1},{urr_id,2}]}},#{Keys:{linked_urr_id,measurement_method,reporting_triggers,urr_id} Values:{{linked_urr_id,1},{measurement_method,0,1,0},{reporting_triggers,1,0,0,0,0,0,0,0,
...},{urr_id,2}}},#{Keys:{measurement_method,reporting_triggers,urr_id} Values:{{measurement_method,0,1,1},{reporting_triggers,0,0,0,0,0,0,0,0,...},{urr_id,1}}}}},...},#<ffff>{['3GPP-RAT-Type'|1],#<1000>{#<220>{['Called-Station-Id'|#HeapBinary<0x11,0x78652e79786f7270,0x656e2e656c706d61,0x74>]
,['3GPP-IMEISV'|#RefcBinary<0x10,0x7fce9a1455b8,0x55f96b011c30,0x55f96b011c48,(nil)>]}},['Diameter-Session-Id'|#HeapBinary<0x25,0x3633353b7375657a,0x313b343438373639,0x3832353833383831,0x313930313737383b,0x7f3531383137>],#<8440>{['Service-Type'|'Framed-User'],['3GPP-MSISDN'|#RefcBinary<0xc,
0x7fce9a145588,0x55f96b011718,0x55f96b011730,(nil)>],['DNS-Server-IPv6-Address',#HeapBinary<0x10,0x604860480120,0x8888000000000000>,#HeapBinary<0x10,0x604860480120,0x4488000000000000>]},#<c>{['MS-Secondary-NBNS-Server'|#HeapBinary<0x4,0x100007f>],['MS-Primary-NBNS-Server'|#HeapBinary<0x4,
0x100007f>]},#<1003>{['User-Location-Info'|#{Keys:{'ECGI','TAI'} Values:{{ecgi,{#RefcBinary<0x3,0x7fce9a1464d8,0x55f96b011bd0,0x55f96b011be8,(nil)>,#RefcBinary<0x2,0x7fce9a148388,0x55f96b0107f8,0x55f96b010810,(nil)>},138873180},{tai,{#RefcBinary<0x3,0x7fce9a1483b8,0x55f96b011b50,
0x55f96b011b68,(nil)>,#RefcBinary<0x2,0x7fce9a1483e8,0x55f96b011b90,0x55f96b011ba8,(nil)>},55001}}}],['Session-Id'|16#20017aa407155500000000cc37c7d668],['3GPP-SGSN-Address'|{127,0,100,1}]},#<8880>{['3GPP-IMSI'|#RefcBinary<0xf,0x7fce9a1463b0,0x55f96b011e28,0x55f96b011e40,(nil)>],['NAS-Identifi
er'|#HeapBinary<0xe,0x6e6564492d53414e,0x726569666974>],['Node-Id'|#HeapBinary<0x7,0x3130302d574750>]},#<8102>{['Calling-Station-Id'|#RefcBinary<0xc,0x7fce9a145318,0x55f96b011718,0x55f96b011730,(nil)>],#<12>{['Password'|#HeapBinary<0x4,0x177677265>],['QoS-Information'|#{Keys:{'APN-Aggregate-M
ax-Bitrate-DL','APN-Aggregate-Max-Bitrate-UL','Allocation-Retention-Priority','Guaranteed-Bitrate-DL','Guaranteed-Bitrate-UL','Max-Requested-Bandwidth-DL',...} Values:{1704125000,48128000,#{Keys:{'Pre-emption-Capability','Pre-emption-Vulnerability','Priority-Level'} Values:{1,0,10}},0,0,0,0,
...}}]},['Framed-IP-Address'|{10,184,235,132}]},#<a0>{['Idle-Timeout'|28800000],['MS-Primary-DNS-Server'|#HeapBinary<0x4,0x7fcf08080808>]},#<4800>{['3GPP-SGSN-MCC-MNC'|{#RefcBinary<0x3,0x7fce9a145448,0x55f96b011660,0x55f96b011678,(nil)>,#RefcBinary<0x2,0x7fce9a1464a8,0x55f96b0116a0,
0x55f96b0116b8,(nil)>}],['Framed-Interface-Id'|{0,0,0,0,0,0,0,1}]},#<3000>{#<402>{['Username'|#HeapBinary<0x4,0x7fce77677265>],['Framed-Protocol'|'GPRS-PDP-Context']},['Multi-Session-Id'|16#20017aa407155500000000cc37c7d667]},#<6000>{['Framed-Pool'|#HeapBinary<0x6,0x412d6c6f6f70>],['Framed-IPv
6-Pool'|#HeapBinary<0x6,0x412d6c6f6f70>]},...},v2}}]}}.
RoadRunnr commented 3 years ago

One more datapoint, I've been working on a change that makes much more use of the gen_server:send_request function. Running the test suite in that branch has a much higher chance (about 50% or every other run) to trigger the crash.

The branch in question is: https://github.com/travelping/ergw/tree/exp/non-blocking-fsm-monadic

garazdawi commented 3 years ago

The problem seems to be related to garbage collection and handling of alias signals. The symptoms are as far as I can tell very similar to what was fixed in https://github.com/erlang/otp/pull/4870.

I'll do some digging and see if I can figure out what is wrong, but since it is vacation season it might be a while until we come up with a solution. Anything you can do to create a smaller reproducible example would help a lot.

rickard-green commented 2 years ago

Sorry for late reply. I've looked at this a couple of times previously without finding anything. After taking a further look at it now I'm quite convinced that the alias functionality is a red herring regarding this crash. The #InternalRefError<0x7fce9a18194a> reported by etp is due to etp not having been updated to handle aliases (I have put that on the todo-list). At the location where the crash happen you have also more or less not used any alias functionality except created the alias (disregarding preceding unrelated alias operations that may have been performed).

Some broken term has entered the heap, but it may have come from more or less any functionality. We have fixed quite a few bugs that could explain this since this crash happened. Have you tried the latest patch on maint-24 (currently OTP 24.1.7)? If not, I would recommend upgrading to that patch level. If you've already tested that, do you still have crashes?

rickard-green commented 2 years ago

@RoadRunnr closing this since we cannot reproduce it. Please reopen if you can reproduce it on latest maint-24.