TheEvergreenStateCollege / upper-division-cs-23-24

A Course in Data Structures & Algorithms, Purposeful Web Engineering, Software Construction
https://theevergreenstatecollege.github.io/upper-division-cs/
MIT License
9 stars 4 forks source link

Project Idea: HyperLogLog #74

Open learner-long-life opened 1 year ago

learner-long-life commented 1 year ago

For those still looking for a (team) project idea, HyperLogLog is an interesting probabilistic data structure that is worth studying.

https://chengweihu.com/hyperloglog/

It would benefit from a large dataset in an "online" setting, that is, live data with new datapoints arriving daily in real-time, a large historical dataset that would be expensive to re-run.

Examples:

Your project will be to relate HyperLogLog to the introductory concepts we are learning in DSA, as well as code up an implementation in a programming language of your choice, run it on any dataset you choose, measure its performance and suitability, critique your implementation, and suggest opportunities for improvement.

learner-long-life commented 1 year ago

A good explanation video https://www.youtube.com/watch?v=lJYufx0bfpw&pp=ygUSZ29vZ2xlIGh5cGVybG9nbG9n

learner-long-life commented 1 year ago

Doing your own web crawls for large datasets https://commoncrawl.org/