Currently, benchmarking is conducted using real TOPMed 10b data. However, the input data is too large to share via GitHub. Providing code for simulating larger datasets would allow users to regenerate test data locally. The initial simulation code is available here, but it currently supports only a very small sample size.
Additionally, the existing benchmarking code is not yet tracked by Git. Including scripts or instructions for setting up a benchmarking environment would be beneficial.
Tasks to be addressed:
[ ] Describe how to set up the benchmarking software environment.
[ ] Simulate a larger dataset for benchmarking purposes.
[ ] Formalize the benchmarking code from its current state (not included in the repo yet).
Currently, benchmarking is conducted using real TOPMed 10b data. However, the input data is too large to share via GitHub. Providing code for simulating larger datasets would allow users to regenerate test data locally. The initial simulation code is available here, but it currently supports only a very small sample size.
Additionally, the existing benchmarking code is not yet tracked by Git. Including scripts or instructions for setting up a benchmarking environment would be beneficial.
Tasks to be addressed: