DataBiosphere / analysis_pipeline_WDL

Collection of WDL workflows based off the University of Washington TOPMed DCC Best Practices for GWAS. The WDL structure was based upon CWLs written by the Seven Bridges development team.
6 stars 3 forks source link

Weirdness in position output #63

Closed aofarrel closed 2 years ago

aofarrel commented 2 years ago

For the position configuration, the combine task's output is very strange when compared to that of the CWL output. WDL chr2 is within tolerance of CWL chr2, but CWL chr1 is way off from WDL chr1. The differences begin all the way back in assoc-aggregate, in segment 1 specifically.

aofarrel commented 2 years ago

Previous "analysis" that had a lot of errors: https://gist.github.com/aofarrel/52a687a527aab8eaff9b711dda7a0c05

aofarrel commented 2 years ago

Segment Differences

In this case the CWL comes from an AWS backend. These files are taken directly from the output of assoc-aggregate.

> all.equal(CWLpositionchr1seg1, WDLpositionchr1seg1)
 [1] "Component “results”: Component “n.site”: Mean relative difference: 1"                                              
 [2] "Component “results”: Component “n.alt”: Mean relative difference: 0.08333333"                                      
 [3] "Component “results”: Component “n.sample.alt”: Mean relative difference: 0.08653846"                               
 [4] "Component “results”: Component “Score”: Mean relative difference: 0.007713071"                                     
 [5] "Component “results”: Component “Score.SE”: Mean relative difference: 0.01516738"                                   
 [6] "Component “results”: Component “Score.Stat”: Mean relative difference: 0.008776"                                   
 [7] "Component “results”: Component “Score.pval”: Mean relative difference: 0.006962627"                                
 [8] "Component “results”: Component “Est”: Mean relative difference: 0.01644271"                                        
 [9] "Component “results”: Component “Est.SE”: Mean relative difference: 0.02306819"                                     
[10] "Component “results”: Component “PVE”: Mean relative difference: 0.01827185"                                        
[11] "Component “results”: Component “MAC”: Mean relative difference: 0.08333333"                                        
[12] "Component “variantInfo”: Component “375790”: Attributes: < Component “row.names”: Numeric: lengths (1, 2) differ >"
[13] "Component “variantInfo”: Component “375790”: Component “variant.id”: Numeric: lengths (1, 2) differ"               
[14] "Component “variantInfo”: Component “375790”: Component “chr”: Lengths (1, 2) differ (string compare on first 1)"   
[15] "Component “variantInfo”: Component “375790”: Component “pos”: Numeric: lengths (1, 2) differ"                      
[16] "Component “variantInfo”: Component “375790”: Component “allele.index”: Numeric: lengths (1, 2) differ"             
[17] "Component “variantInfo”: Component “375790”: Component “n.obs”: Numeric: lengths (1, 2) differ"                    
[18] "Component “variantInfo”: Component “375790”: Component “freq”: Numeric: lengths (1, 2) differ"                     
[19] "Component “variantInfo”: Component “375790”: Component “MAC”: Numeric: lengths (1, 2) differ"                      
[20] "Component “variantInfo”: Component “375790”: Component “weight”: Numeric: lengths (1, 2) differ"  

> all.equal(WDLpositionchr1seg2, CWLpositionchr1seg2)
[1] TRUE

> all.equal(CWLpositionchr1seg20, WDLpositionchr1seg20)
[1] TRUE  
aofarrel commented 2 years ago

There was a problem in the config file in assoc-aggregate resulting in variant_include not getting read. Seems to no longer be an issue.