Closed nicknezis closed 2 years ago
Also reverting this change which was added 2 years ago as a bandaid to remedy #3769 .
The commit looks good. Just wonder what was the original reason of using a random name in bookkeeper upload?
The commit looks good. Just wonder what was the original reason of using a random name in bookkeeper upload?
It looks like a bit of code added early in this PR. Someone suggested they use a timestamp instead of random to see which package was newest. But it doesn't seem to have much thought put behind the original Random Long. I'll have to look further back in the history to determine when it was added.
Based on the unit test: testGenerateFilename
, it seems like the file name is expected to be unique.
Based on the unit test:
testGenerateFilename
, it seems like the file name is expected to be unique.
Interesting, the test still passes on my local MacOS. I'll have to dig into it further. I'll have to make it fail the way it did in TravicCI, and then fix the logic. Hopefully just a simple environment issue on my end.
Looking in each Uploader, they seem to have logic for uploading a package with the same name. But this logic would never be needed if the intention was to always generate a unique filename. I really think this is a relic from the old Twitter code base. It doesn't seem to provide any value.
But it does cause a storage leak in the Bookkeeper and Zookeeper. It is not as big a deal in Zookeeper, but we have definitely seen Bookkeeper fill up with all of the old artifacts. Especially when on a developer cluster where the same analytic job might be resubmitted multiple times through the development process.
Edit: I figured out my issue. I wasn't running heron test heron/spi/...
because I was only running in heron/uploaders/...
. :)
Yeah. It sounds a little strange. Maybe the logic is assuming manual cleanup or some kind of TTL?
Fixes #3769
If you exec into the Bookkeeper pod, you can run the following command in
/opt/bookkeeper/bin
./dlog tool watch -u distributedlog://heron-zookeeper:2181/heron
When I upload the same package, I can see it deleting the old one, and then uploading the new one
Without this fix, this bit of code is never executed. The package name never matches, so it never deletes the old package.