LinearTapeFileSystem / ltfs

Reference implementation of the LTFS format Spec for stand alone tape drive
BSD 3-Clause "New" or "Revised" License
255 stars 76 forks source link

Recursive protection to _take_dump() #412

Closed ochomenosocho closed 1 year ago

ochomenosocho commented 1 year ago

Summary of changes

This pull request includes following changes or fixes.

Description

The _take_dump() function calls several functions in a specific order, which causes the _take_dump() function to be called again at the end of the sequence, creating a recursion problem. This occurs when trying to generate a drive dump of a faulty drive during startup.

Fixes #411

Type of change

Checklist:

ochomenosocho commented 1 year ago

Hi, I added a pthread_mutex to prevent errors when it is used in the multiple drive environment. Please, take a look at it.

piste-jp commented 1 year ago

I added a pthread_mutex to prevent errors when it is used in the multiple drive environment

No. It's not what I'm saying.. And the mutex you aded is no meaning at all.

The problem is sharing the static unsigned char recursive_counter variable in multiple drives in the multiple drive environment like LE.

The scenario below might happen.

  1. We have 2 drives in one ltfs process, driveA and driveB
  2. driveA and driveB is accessing to the each drives by their each priv and each thread concurrently (but function is shared, in other words, each thread could call the same function concurently)
  3. A driveA thread calls _take_dump()
  4. The driveA thread have a bad response, increment recursive_counter to 1 and call _take_dump() recursively
  5. Context switch happens here and a driveB calls _take_dump()
  6. The driveB thread see the recursive_counter and it is 1 because it is incremented to by step4 above. As a result, _take_dump()@driveB returns immediately without taking dump and recursive_counter is return to 0
  7. Context switch happens here and back to the driveA thread
  8. The driveA thread see the recursive_counter and it is 0 because it is reverted to by step6 above. As a result, _take_dump()@driveA calls _take_dump() recursively with incrementing recursive_counter to 1 again

So I cannot accept this kind of code.

piste-jp commented 1 year ago

@juliocelon ,

Could you merge this with the Squash and merge button? Otherwise we might have so many meaningless commits in the master branch.

Please be careful and please feel free to contact me if you are feeling uneasy. Let's have a live session and do it together.

piste-jp commented 1 year ago

@juliocelon ,

I think you need to cherry-pick the commit a7916bf0b8d7264117341a5bcd06fb32a5c160f8@master to the v2.4-stable branch.

piste-jp commented 1 year ago

Hi Moises, I do not understand the following change.

const uint16_t page_len = ((uint16_t)logdata[2] << 8) | (uint16_t)logdata[3];
  uint16_t param_code, param_len;

Why page_len was defined as constant rather than a variable.

Thanks

From language spec point of view, it just declares page_len is initialed by ((uint16_t)logdata[2] << 8) | (uint16_t)logdata[3] and it is not changed at all in this function. In other words, the compiler report an error when it is changed by a fix in the future.

From logic point of view, there is no difference between const and non-const here. So it is not a bad definition. There is some side effects in this kind of declaration but I decided that it is good to respect the proposed code.