[1/N][GPU encoder] Add benchmarking code and refactor encoding module

Why are these changes needed?

Adds benchmarking code under encoding/bench. Run make benchmark_cpu to run a benchmark with the default settings. The command outputs a file called benchmark_results.json that has a list of encode times for each run.

Flags:

    flag.StringVar(&config.OutputFile, "output", "benchmark_results.json", "Output file for results")
    flag.Uint64Var(&config.BlobLength, "blob-length", 1048576, "Blob length (power of 2)")
    flag.Uint64Var(&config.NumChunks, "num-chunks", 8192, "Minimum number of chunks (power of 2)")
    flag.Uint64Var(&config.NumRuns, "num-runs", 10, "Number of times to run the benchmark")
    flag.StringVar(&config.CPUProfile, "cpuprofile", "", "Write CPU profile to file")
    flag.StringVar(&config.MemProfile, "memprofile", "", "Write memory profile to file")
    flag.BoolVar(&config.EnableVerify, "enable-verify", false, "Verify blobs after encoding")

The PR also refactors the code in order to support GPU based components in the future. Separating out the ideas present in https://github.com/Layr-Labs/eigenda/pull/642 into multiple PRs.

Got the following result on a g6.4xlarge (only using the CPU code)


| Encoded Blob Size | Num Chunks | Chunk Len | Encoding time (avg 10 runs) | Dominant factor |
|-------------------|------------|-----------|---------------------------|-----------------|
| 32768             | 8192       | 1         | 12.774s                   | Multiproof fft1 |
| 65536             | 8192       | 2         | 12.853s                   | Multiproof fft1 |
| 131072            | 8192       | 4         | 12.969s                   | Multiproof fft1 |
| 262144            | 8192       | 8         | 13.099s                   | Multiproof fft1 |
| 524288            | 8192       | 16        | 13.360s                   | Multiproof fft1 |
| 1048576           | 8192       | 32        | 13.765s                   | Multiproof fft1 |
| 2097152           | 8192       | 64        | 14.496s                   | Multiproof fft1 |
| 4194304           | 8192       | 128       | 15.803s                   | Multiproof fft1 |
| 8388608           | 8192       | 256       | 18.043s                   | Multiproof fft1 |
| 16777216          | 8192       | 512       | 24.041s                   | Multiproof msm  |
| 33554432          | 8192       | 1024      | 29.168s                   | Multiproof msm  |

In addition, at the larger blob sizes the reed solomon encoding also becomes a dominant factor which suggest we should focus on accelerating ComputeMultiFrameProof and ExtendPolyEval.

Checks

[x] I've made sure the lint is passing in this PR.
[x] I've made sure the tests are passing. Note that there might be a few flaky tests, in that case, please comment that they are not relevant.
Testing Strategy
- [ ] Unit tests
- [ ] Integration tests
- [ ] This PR is not tested :(

Layr-Labs / eigenda

[1/N][GPU encoder] Add benchmarking code and refactor encoding module #715

Why are these changes needed?

Checks