Closed jtv4k closed 1 year ago
@jtv4k You mean to rename to an existing filter? e.g. an implicit drop of the existing filter then?
@armon Yes, rename an existing filter so the previous name is impliticly dropped. It could be a copy-then-drop action but that would seem less efficient than a rename. The new filter name would contain all the data of the old.
For example:
list
START
END
create temp_filter_1
Done
set temp_filter_1 somevalue
Yes
check filter_1 somevalue
Filter does not exist
rename temp_filter_1 filter_1
Done
check temp_filter_1 somevalue
Filter does not exist
check filter_1 somevalue
Yes
@jtv4k Hmm. This would be a pretty complex change internally ATM. The underlying files are at a path based on the filter name, so we'd have to support renaming all those files, while handling the fact that the delete of the old filter is taking place in the background. So it would be something like:
1) Rename temp_filter
to filter
such that new requests go to that
2) Background delete original filter
3) Migrate temp_filter
to filter
location on disk (rename files)
The issue is that you need to handle partial failures in all that. It could be done, you'd need like a write ahead log of sorts. It's just not something that is clean or simple. Unfortunately I'm very engaged with my work at HashiCorp and don't have time to tackle something like this.
Same functional request ... Use-case : 1- Large bloom datasets are built off-line "somewhere else" , version X. 2- Many clients read the current version of theses bloom filters ( eg through "bloomd" ). 3- New versions of these bloom datasets are built off-line "somewhere else" , version "X+1". 4- How to smoothly change the clients from the version X to version "X+1" ?
jtv4k proposes "atomic rename" of the files, which seems difficult, according to Armon.
Can we imagine other solutions ? ( using several boomd in parralel , .. ? )
One proposal for the use-case exposed just before: A load-balancer, with health-check layer7-dummy-HTTP-200, connected on two bloomd ( blooomdA and bloomdB ) with same "tcp_port=8673". "bloomdA" has "data_dir=/mnt/bloomdA". "bloomdB" has "data_dir=/mnt/bloomdB" 1- version X is written in /mnt/bloomdA 2- bloomdA is started, and declares himself up on layer7 ( responding 200 on http-dummy-request ) : load-balancer sends user's request on bloomdA 3 - version X+1 is written in /mnt/bloomdB 4- bloomdB is started, and declares himself up on layer7 : load-balancer sends user's request on bloomdA(version X), and bloomdB(version X+1) 5- bloomdA receives a "command-to-stop" (TBD), and declares himself down on layer7 ( responding 400 on http-dummy-request ) : load-balancer stops to send user's request on bloomdA(version X), and load-balancer sends all new requests on bloomdB(versionX+1). bloomdA can die smoothly. 6- version X+2 can be written on /mnt/bloomdA 7- etc...
This solution relies on load-balancer, and needs two enhacements in bloomd: 1- bloomd must answer to dummy "keep-alive" on layer 7 http ( dummy answer 200 when "on", or 400 when "off" ) 2- bloomd must accept a "command-to-stop" : answer 400 on layer-7-http keep-alive request, and die when no more request pending ( or after a delay )
Depending on the app, it might be simpler to simpler coordinate a switch over from filterA
to filterB
. For example, using something like Consul (shameless plug, I apologize), you could set a key like service/foobar/bloomd_set
= filterA
and then update that key when the offline build is done. Applications can be edge-triggered when that key changes and start using a new key. This seems like the simplest option, as it requires no load balancer, or bloomd changes. The client is just deciding which filter to use and something like Consul/ZooKeeper can be used to coordinate the change.
You are right !
I'm going in your direction : I build something over the bloomd to manage the usecase.
Do you wnt me to auto-answer in github ?
thank you for your answer
Franck
Message du 15/12/16 06:04 De : "Armon Dadgar" A : "armon/bloomd" Copie à : "Revolle" , "Comment" Objet : Re: [armon/bloomd] rename filter command (#33)
Depending on the app, it might be simpler to simpler coordinate a switch over from filterA to filterB. For example, using something like Consul (shameless plug, I apologize), you could set a key like service/foobar/bloomd_set = filterA and then update that key when the offline build is done. Applications can be edge-triggered when that key changes and start using a new key. This seems like the simplest option, as it requires no load balancer, or bloomd changes. The client is just deciding which filter to use and something like Consul/ZooKeeper can be used to coordinate the change.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/armon/bloomd","title":"armon/bloomd","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/armon/bloomd"}},"updates":{"snippets":[{"icon":"PERSON","message":"@armon in #33: Depending on the app, it might be simpler to simpler coordinate a switch over from filterA
to filterB
. For example, using something like Consul (shameless plug, I apologize), you could set a key like service/foobar/bloomd_set
= filterA
and then update that key when the offline build is done. Applications can be edge-triggered when that key changes and start using a new key. This seems like the simplest option, as it requires no load balancer, or bloomd changes. The client is just deciding which filter to use and something like Consul/ZooKeeper can be used to coordinate the change.\r\n\r\n"}],"action":{"name":"View Issue","url":"https://github.com/armon/bloomd/issues/33#issuecomment-267239759"}}}
It would be great to have server supported atomic renames.
We have a script that builds several large bloom filters, which can take a while. It would be great if we could build the filters under temporary names and then (atomically) rename the all the filters:
rename temp_filter_1 filter_1, temp_filter_2 filter_2, temp_filter_3 filter_3
That would allow us to build the bloom filters in the background under a pseudonym, then checks against the final filter name will fail. Once the build completes, we perform the renames and suddenly the client checks begin working.
Client 1:
Client 2:
Client 1: