Flash is highly optimised typeahead(search recommendation) server. We are building it with an aim to learn and teach various tradeoffs/practices. While building it, we will use different data structures/storage devices/protocols/deployment strategies and compare them from different aspects.
keeping in mind the fact that we are designing for Humans. Average time at which humans respond is 200ms( interval b/w consecutive key stokes), So we need to keep this in mind and keep our latency < 200ms always and..... Amazon search recommendation api calls responds in around ~100ms.
Lookup time for Memory, SSD, and HDD is 100ns, 150us, 10 ms respective. It makes sense for us to keep all data in memory itself for low latency. We can do sharding on range bases, if data grows more than the size of 1 node. further we can use disk to take backup by taking periodic snapshot of data, for fault tolerance on poweroff. Later, we can also try to use SSD for storage and see how trade off b/w cost vs latency works.
Our usecase is to fetch popular strings with same prefix. Trie looks intuitive here, but I feel we could perform better with hashmap as with hashmap it will be easy for us to shard, Maps are native for memory, and for every prefix as key, we can use a priority queue as value which contains all popular strings with same prefix as key. Doing this with trie will be little harder. But it's going to be interesting to compare both DS. Making an algorithm to scale them.
yes, Idea is to make api call to server with every key stroke in search box and return all strings with same prefix as response, means server will be in heavy read load. We can use websockets to faster resonse from server and not setting up TCP connection for every keystroke but I feel using ws is going to be overkill as we don't need duplex connection. We can start with HTTP and do some optimisations like adding "Connection: keep-alive" header to tell server to not close connection for subsequent connections.
As latency is most important factor, We can try to deploy in various regions and using a load balancer to route to closes geolocation server. We need low latency, with high avaibility and less or eventual consistency that means it's fine if users in different region see different suggestions for their prefix. I feel deployment is dominant part in keeping latency low, We can do benchmarking after adding every new component(load balancer, sharding) to understand their affects.
Keeping perfomance in mind, I chose GO over any iterperative language. Rust might be even better choice, but as I don't have enough knowledge with Rust rigth now so may be We an migrate it later.
git clone https://github.com/Shubham8287/flash
cd flash/server/src
make run # start flash and node expoerter server
make mon # to setup monitoring
curl localhost:8080/isAlive
You should get response - Boss is always fine
To run actual Api hit - localhost:8080/find?prefix=yo
you will get a response of top matching string starting with prefix "yo"
{"prefix":"yo","matches":["yo","yob","yobbo","yobboes","yobbos","yobi","yobs","yocco","yochel","yock","yocked","yockel","yockernut","yocking","yocks","yod","yode","yodel","yodeled","yodeler"]}
_ "net/http/pprof"
for healthcheck route in localhost, I am getting response ~1-2ms.
go run ./...
- to run serverlsof -i:[PORT]
- to know PID of proccess running on PORT.curl -w "@curl-format.txt" -s [URL]
- to know response time of your request with curlsudo pmap [PID]
- check total memory used by PIDab -n 100 -c 1 -k URL
- use apache bench to make n request with c concurrency. -k is to keep tcp connection open.