Open shukwong opened 1 month ago
In the nf-core/sarek benchmarking section, it describes:
On each release, the pipeline is run on 3 full size tests:
- test_full runs tumor-normal data for one patient from the SEQ2C consortium
- test_full_germline runs a WGS 30X Genome-in-a-Bottle(NA12878) dataset
- test_full_germline_ncbench_agilent runs two WES samples with 75M and 200M reads (data available here). The results are uploaded to Zenodo, evaluated against a truth dataset, and results are made available via the NCBench dashboard.
So we can likely utilize these datasets for testing.
Import and run Sarek on the curated dataset on Biowulf and AWS HealthOmics Test for performance on both platforms