ECP-VeloC / AXL

Asynchronous Transfer Library
MIT License
2 stars 8 forks source link

AXL_Stop with pthreads #51

Closed adammoody closed 4 years ago

adammoody commented 5 years ago

There is a subtle issue that may arise with the intended use of AXL_Stop and pthreads, perhaps other AXL methods as well.

When given a path for AXL to write a state file, it will record its internal kvtree to this file at various points.

https://github.com/ECP-VeloC/AXL/blob/42ff1dd2678573abd7056125b7eee8151f79f407/src/axl.c#L563

If the file exists, it will also read that file back to initialize itself when starting up:

https://github.com/ECP-VeloC/AXL/blob/42ff1dd2678573abd7056125b7eee8151f79f407/src/axl.c#L172

The main point of this is so that AXL can record info about potentially ongoing transfers so that they can be terminated, even after the process that started the transfer may have died and been restarted. Since the process that might be stopping or checking on the transfer is different from the process that started it, an issue can come up when storing memory addresses in this structure, since the memory address likely is invalid in the restarted process.

The particular sequence I'm thinking of right now: 1) application starts transfer, and kvtree is stored to file including memory address of data structures 2) application dies before transfer completes 3) application restarts, calls AXL_Init, AXL reads kvtree back from file 4) application calls AXL_Stop to kill any ongoing transfers 5) AXL_Stop calls AXL_Cancel / AXL_Wait / AXL_Free 6) AXL_Free tries to read and free invalid memory from old address in kvtree

There are likely many ways to fix it, but I don't currently have a suggestion.

tonyhutter commented 5 years ago

Yea, a by-product of using a single id for everything that there's no easy way to pass around structs between functions. I was hoping we could just pass *axl_pthread_data in the kvtree, but it sounds like that not going to work. We could do something hacky and internally maintain a id to struct lookup list. Or just manually NULL out the *axl_pthread_data when we read back the kvtree from a file. I dunno the answer, but I think you bring up a good point.

tonyhutter commented 5 years ago

We should also add a test case to cancel a transfer.

tonyhutter commented 5 years ago

Fix out for review: https://github.com/ECP-VeloC/AXL/pull/59

tonyhutter commented 4 years ago

Fixed in #59