HKU-BAL / ClairS

ClairS - a deep-learning method for long-read somatic small variant calling
BSD 3-Clause "New" or "Revised" License
75 stars 7 forks source link

Benchmarks for compute resources #41

Open awgymer opened 1 week ago

awgymer commented 1 week ago

Hi I have looked through the documentation but I can't see any indication of speed benchmarks or recommended compute to achieve a given throughput?

Given that we run jobs through a scheduler that requires setting resource requests I am wondering if you are able to shed any light on what you might consider to be sensible defaults to provide a process for:

aquaskyline commented 1 week ago

Line 680 in ClairS' preprint gives you some figures about using ClairS on a whole genome. If you are distributing ClairS' job to multiple nodes by setting intervals, you will need to adjust the --chunk_size accordingly. Say if you set --thread 32 for each 5Mbp interval on a single computing node. The best chuck_size is calculated as 5Mbp/32*4, the constant 4 is because ClairS uses 4 threads for each chunk.