Open phudtran opened 4 years ago
It might be the right time to consider using another hashtable..
After analysis, I think the hash table does not support delete operation, the following are my verification steps: 1.Add some log information to trn_transitd.h file
do { \
ENTRY e, *ep; \
e.key = itf; \
e.data = (void *)md; \
ep = hsearch(e, FIND); \
if (ep) { \
if (ep->data) { \
TRN_LOG_DEBUG("overwriting metadata at [%s]", \
itf); \
TRN_FREE(ep->data); \
} \
TRN_LOG_ERROR("overwriting metadata, ep->data is null"); \
ep->data = (void *)md; \
} else { \
TRN_LOG_DEBUG( \
"inserting [%s] for first time. allocate memory for interface key," \
" since RPC XDR will eventually free its value.", \
itf); \
e.key = malloc(sizeof(char) * (strlen(itf)+ 1)); \
strcpy(e.key, itf); \
e.key[strlen(itf)] = '\0'; \
ep = hsearch(e, ENTER); \
} \
if (!ep) \
return -1; \
return 0; \
} while (0)
Add the statement TRN_LOG_ERROR ("overwriting metadata, ep-> data is null") in the macro INTF_FIND;
do { \
ENTRY e, *ep; \
e.key = itf; \
ep = hsearch(e, FIND); \
if (!ep) { \
return NULL; \
} \
TRN_LOG_ERROR("Find ep, return ep->data:%p", ep->data); \
return ep->data; \
} while (0)
Add the statement TRN_LOG_ERROR ("Find ep, return ep-> data:% p", ep-> data) in the macro INTF_FIND;
View syslog: root@1c81bb2d3435:/# cat /var/log/syslog | grep "inserting" Feb 17 07:13:18 1c81bb2d3435 transit[149]: [trn_itf_table_insert:72] inserting [eth0] for first time. allocate memory for interface key, since RPC XDR will eventually free its value. Feb 17 07:13:29 1c81bb2d3435 transit[149]: [trn_vif_table_insert:87] inserting [gnv_peer1] for first time. allocate memory for interface key, since RPC XDR will eventually free its value. Feb 17 07:13:31 1c81bb2d3435 transit[149]: [trn_vif_table_insert:87] inserting [gnv_peer3] for first time. allocate memory for interface key, since RPC XDR will eventually free its value. Feb 17 07:13:34 1c81bb2d3435 transit[149]: [trn_vif_table_insert:87] inserting [gnv_peer0] for first time. allocate memory for interface key, since RPC XDR will eventually free its value. root@1c81bb2d3435:/# cat /var/log/syslog | grep "overwriting" Feb 17 07:14:05 1c81bb2d3435 transit[149]: overwriting metadata, ep->data is null
Therefore, calling the macro INTF_DELETE cannot delete gnv_peer1 exactly. View hsearch help, which has the following description:
Bugs SVr4 and POSIX.1-2001 specify that action is significant only for unsuccessful searches, so that an ENTER should not do anything for a successful search. In libc and glibc (before version 2.3), the implementation violates the specification, updating the data for the given key in this case.
Individual hash table entries can be added, but not deleted.
3.Cause the ENTRY of hash table cannot be deleted, If set the key to NULL when deleting, I think there will be uncertain behavior(need to read the source code of hsearch()).we may only need to set the value to NULL when deleting.The bug is not recurring after deleting "free(ep->key);" from INTF_DELETE.
do {\ TRN_LOG_DEBUG ("Delete: [% s]", itf); \ ENTRY e, * ep; \ e.key = itf; \ ep = hsearch (e, FIND); \ if (ep) {\ TRN_FREE (ep-> data); \ TRN_LOG_DEBUG ("Deleted: [% s]", itf); \ } \ } while (0) Finally, Cause the ENTRY of hash table cannot be deleted, there will be a memory leak. May we can consider using another hashtable.
Thanks, @chenpiaoping. Yes, we shall consider a new hash table. Can you augment your analysis and see if the metadata, can be simply stored in an ebpf map (hash)? We don't need that table values solely in user-space. We currently move data selectively from it to ebpf maps. It might be better to directly have these interface metadata stored with the XDP program in a map and of course still accessible to userspace.
What happened When making consecutive update_agent_metadata calls, every other call will fail in the following scenario:
Setting ep->data to null in the macro INTF_DELETE() seems to be the issue. trn_transitd.h:81 However removing that line will reintroduce an old bug.
What you expected to happen update_agent_metadata should not fail How to reproduce it
Now try to update agent metadata for ep1. It will fail every other time.
Environment Ubuntu Server 18.04 LTS (HVM) on ec2 t2.2xlarge More Information Deleting just the endpoint will not reproduce the bug issue. We have to either delete the network, or delete the vpc.