archiecobbs / s3backer

FUSE/NBD single file backing store via Amazon S3
Other
535 stars 77 forks source link

zero_block_cache_size incorrect in stats file #204

Closed rbfnet closed 1 year ago

rbfnet commented 1 year ago

stats.current_cache_size in zero_cache_private is initialized to zero, and not updated with the results of the zero-block survey. If the survey discovers, say, 100 zero blocks, and then five of them are consumed, stats_current_cache_size will be decremented five times. Since it's unsigned, the result showed in the stats file will be 2^32-5.

Seems like a fix would be to replace: bitmap_or(...) in zero_cache_survey_main with a loop that iterates over priv->survey_zeros and calls zero_cache_update_block() to set the zero flag for every zero block discovered in the survey, but there may be a way that doesn't involve iterating over every block.

(I considered current_cache_size += survey_count at the completion of the survey, but I'm not sure that works. The case that concerns me is a block being written (with non-zero) data, and then re-written with zeros, with both occurring before the survey gets to the block. Seems like that might increment current_cache_size for that block, and then increment survery_count for that same block, so adding survey_count to currrent_cache_size would then double count that block.)

archiecobbs commented 1 year ago

Good catch - thanks. See if 571b09d fixes it for you (re-open bug if not).

rbfnet commented 1 year ago

Although that clears the issue in my current case, I think it fails in cases where a block is added to the zero-cache before the survey completes. (For example, block 10000 is empty and is read before the survey gets to it.)

The new function bitmapor2 return is a count of the following cases:

(A) dst was initially set, src was clear (B) dst was initially set, src was also set (C) dst was initially clear, src was set

What we need is a count of case (C), because that's the only case where the number of zero-cache blocks increased. But the count we get will also include casea (A) and (B), despite nothing changing in dst.

Maybe the following: for (i = 0; i < nwords; i++) { count += popcount32(~dst[i] & src[i]); dst[i] |= src[i]; };

(I don't have privileges to re-open, so it remains closed.)

archiecobbs commented 1 year ago

Not sure what you mean. The code is simply counting up the number of bits in priv->zeros from scratch. The counting just happens to be in combination with updating priv-zeroes - but we count the bits in each word after it gets updated.

The value returned from bitmap_or2() is not added to the current stats value, it replaces it.

rbfnet commented 1 year ago

Apparently, what I mean is "I didn't read the change correctly". Somehow I read "+=" instead of "=" in this: priv->stats.current_cache_size = bitmap_or2(priv->zeros, priv->survey_zeros, config->num_blocks); (It did work when I tested, but I didn't have a test case for the race I was (incorrectly) worried about.)

Sorry for the confusion. I agree the fix is good.