Greatly improved performance

I took most of the low hanging fruits: ~ +5-15% on caching ~ +5-10% on parsing Plus some general improvements to performance that maybe all had up to 1% but I wasn't able to test it because of the network variability.

On my computer (Ryzen 7950XT):

Before the optimizations: 4.2s (+- 0.3s)
After the optimizations: 3.1s (+- 0.2s) Note that the benchmark has been made averaging 12 sample runs. Here is the run command for reproducibility: cargo run --release --example multiple_downloads --features=performance_analysis

A new performance analysis mode has been added, enabling the use of flamegraphs.

Note that most of the improvements are on CPU bound tasks. On lower end CPUs the improvement would probably be a lot more since I'm mostly network bound.

I also added comments on my change to explain how much each one improved performance. I noticed also that the code could be simplified by quite a margin, I do a followup PR on code aesthetics and quality.

Mithronn / rusty_ytdl

Greatly improved performance #17