Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments
bshifaw [3:59 PM]
Hi Chris,
The featured joint calling method is using NIO.
https://portal.firecloud.org/#methods/gatk/joint-discovery-gatk4/9/wdl
Is this the method you are referencing? (edited)
bshifaw [4:28 PM]
@vdauwera, just confirmed with @jsoto. The wdl isn’t using NIO when importing the GVCFs. Due to a change in the wdl we decide to implement to best leverage the FC data model (using an array of input files instead of a sample name map file). (edited)
Collapse
cwhelan [9:48 PM]
right, that’s the method i was using.
vdauwera [11:22 PM]
oooh that’s an interesting case that would benefit from the flexible data models work — this would be great to show @andreah
Discussion #2
cwhelan [11:17 AM]
ie it’s trying to localize each gvcf to each shard instance
tjeandet [11:17 AM]
do you have an idea of how many input files each shard has ?
Collapse
cwhelan [11:17 AM]
555 samples
Discussion #1
Discussion #2
Takeaways
Run https://portal.firecloud.org/#methods/gatk/joint-discovery-gatk4/9/wdl in a non-production environment w/ 555 samples and try to reproduce issue w/ hashing timeouts.
We predict they will not occur as cromwell production was seeing elevated CPU usage due to it's /stats endpoint being hit repeatedly.