TheJacksonLaboratory / SVE

GNU General Public License v3.0
51 stars 12 forks source link

Calling with Hydra does not work in provided Docker container #33

Open rick-heig opened 5 years ago

rick-heig commented 5 years ago

Hello.

Problem

I used the following command to call variants using Hydra through SVE :

root@e9e29a8f2ab3:/home/working# /tools/SVE/bin/sve call -r data/ref/ref.fasta -g hg19 -a hydra sandbox/mother.bam

First of all this gives an error because the script /tools/SVE/src/hydra/scripts/combine-assembled-files.sh uses double brackets [[ which are a bash construct and the script is interpreted as a shell sh script. (This error probably doesn't come up on a machine with /bin/bash as the default shell, however this is not the case on the Docker image, where the script fails, I'd recommend either setting the default shell to /bin/bash, or calling the Hydra scripts with /bin/bash or adding a shebang to the Hydra scripts).

/tools/SVE/src/hydra/scripts/combine-assembled-files.sh: 29: /tools/SVE/src/hydra/scripts/combine-assembled-files.sh: [[: not found
/tools/SVE/src/hydra/scripts/combine-assembled-files.sh: 38: /tools/SVE/src/hydra/scripts/combine-assembled-files.sh: [[: not found

By addind the shebang #!/bin/bash to the script this error does not appear anymore (because the script is executed as a bash script with /bin/bash. However, the execution of Hydra still fails with the following error message :

root@e9e29a8f2ab3:/home/working# /tools/SVE/bin/sve call -r data/ref/ref.fasta -g hg19 -a hydra sandbox/mother.bam
loaded param_map from: hydra.json
using wrapper: hydra
<<<<<<<<<<<<<SVE command>>>>>>>>>>>>>>>

making the hydra configuration
/usr/local/bin/python /tools/SVE/src/hydra/scripts/make_hydra_config.py -i /home/working/output/mother_S17/bam.stub -s 100000 -n 16 > /home/working/output/mother_S17/bam.stub.config
extracting discordants for sample0
/usr/local/bin/python /tools/SVE/src/hydra/scripts/extract_discordants.py -c /home/working/output/mother_S17/bam.stub.config -d sample0
<<<<<<<<<<<<<SVE command>>>>>>>>>>>>>>>

routing all samples into hydra router
/tools/SVE/src/hydra/bin/hydra-router -config /home/working/output/mother_S17/bam.stub.config -routedList /home/working/output/mother_S17/bam.routed
<<<<<<<<<<<<<SVE command>>>>>>>>>>>>>>>

combining hydra assembly files
/tools/SVE/src/hydra/scripts/assemble-routed-files.sh /home/working/output/mother_S17/bam.stub.config /home/working/output/mother_S17/bam.routed 1 60
<<<<<<<<<<<<<SVE command>>>>>>>>>>>>>>>

merging results
/tools/SVE/src/hydra/scripts/combine-assembled-files.sh . /home/working/output/mother_S17/all.assembled
<<<<<<<<<<<<<SVE command>>>>>>>>>>>>>>>

starting hydra clustering
/usr/local/bin/python /tools/SVE/src/hydra/scripts/forceOneClusterPerPairMem.py -i /home/working/output/mother_S17/all.assembled -o /home/working/output/mother_S17/all-sv.calls
call error: Traceback (most recent call last):
  File "/tools/SVE/src/hydra/scripts/forceOneClusterPerPairMem.py", line 498, in <module>
    main()
  File "/tools/SVE/src/hydra/scripts/forceOneClusterPerPairMem.py", line 485, in main
    updatedFile       = chooseBestClusterForReads(readSortedFile, clusterSupport)
  File "/tools/SVE/src/hydra/scripts/forceOneClusterPerPairMem.py", line 217, in chooseBestClusterForReads
    updateMappings(clusters, mappings, clusterSupport, out)
  File "/tools/SVE/src/hydra/scripts/forceOneClusterPerPairMem.py", line 180, in updateMappings
    bestCluster = chooseBestClusterForRead(support)
  File "/tools/SVE/src/hydra/scripts/forceOneClusterPerPairMem.py", line 168, in chooseBestClusterForRead
    return distinct_support[0][0]
IndexError: list index out of range

message: 
code: 1
output:

Parameters:
  Configuration file (-config): /home/working/output/mother_S17/bam.stub.config
  Routed file list (-routedList): /home/working/output/mother_S17/bam.routed

Processing: 
  Routing discordant mappings to master chrom/chrom/strand/strand files.
Found sandbox/mother.bam.bedpe
Routing mappings from: sandbox/mother.bam.bedpe...Time elapsed: 0 sec

Parameters:
  Configuration file (-config): /home/working/output/mother_S17/bam.stub.config
  Using routed file as input: 20.20.+.-
  Maximum mappings allowed before "punting": 60

Processing: 
Sorting groups by position.
    Sorting 20.20.+.- by position...Time elapsed: 0 sec
Finding possible breakpoint clusters by position.
    Finding potential clusters in  20.20.+.-.posSorted...
Time elapsed: 0 sec
Assembling raw breakpoint clusters.
FINISHED assembling clusters from 20.20.+.-.posSorted.posClusters.
  Cleaning up old files.
  Cleaning up old files.
  Cleaning up old files.

Parameters:
  Configuration file (-config): /home/working/output/mother_S17/bam.stub.config
  Using routed file as input: 20.20.+.+
  Maximum mappings allowed before "punting": 60

Processing: 
Sorting groups by position.
    Sorting 20.20.+.+ by position...Time elapsed: 0 sec
Finding possible breakpoint clusters by position.
    Finding potential clusters in  20.20.+.+.posSorted...
Time elapsed: 0 sec
Assembling raw breakpoint clusters.
FINISHED assembling clusters from 20.20.+.+.posSorted.posClusters.
  Cleaning up old files.
  Cleaning up old files.
  Cleaning up old files.

Parameters:
  Configuration file (-config): /home/working/output/mother_S17/bam.stub.config
  Using routed file as input: 20.20.-.+
  Maximum mappings allowed before "punting": 60

Processing: 
Sorting groups by position.
    Sorting 20.20.-.+ by position...Time elapsed: 0 sec
Finding possible breakpoint clusters by position.
    Finding potential clusters in  20.20.-.+.posSorted...
Time elapsed: 0 sec
Assembling raw breakpoint clusters.
FINISHED assembling clusters from 20.20.-.+.posSorted.posClusters.
  Cleaning up old files.
  Cleaning up old files.
  Cleaning up old files.

Parameters:
  Configuration file (-config): /home/working/output/mother_S17/bam.stub.config
  Using routed file as input: 20.20.-.-
  Maximum mappings allowed before "punting": 60

Processing: 
Sorting groups by position.
    Sorting 20.20.-.- by position...Time elapsed: 0 sec
Finding possible breakpoint clusters by position.
    Finding potential clusters in  20.20.-.-.posSorted...
Time elapsed: 0 sec
Assembling raw breakpoint clusters.
FINISHED assembling clusters from 20.20.-.-.posSorted.posClusters.
  Cleaning up old files.
  Cleaning up old files.
  Cleaning up old files.

adding ./20.20.+.+.posSorted.posClusters.assembled to master SV assembly file (/home/working/output/mother_S17/all.assembled)
adding ./20.20.+.-.posSorted.posClusters.assembled to master SV assembly file (/home/working/output/mother_S17/all.assembled)
adding ./20.20.-.+.posSorted.posClusters.assembled to master SV assembly file (/home/working/output/mother_S17/all.assembled)
adding ./20.20.-.-.posSorted.posClusters.assembled to master SV assembly file (/home/working/output/mother_S17/all.assembled)
    Cleaning up old files...
    Cleaning up old files...

vcf file /home/working/output/mother_S17.vcf exists=False
computing hydra breakpoints
grep -v "#" /home/working/output/mother_S17/all-sv.calls.freq | /usr/local/bin/python /tools/SVE/src/hydra/scripts/hydraToBreakpoint.py -i stdin > /home/working/output/mother_S17/all-sv.calls.bkpts
all hydra stages completed
{'output': 'Traceback (most recent call last):\n  File "/tools/SVE/src/hydra/scripts/forceOneClusterPerPairMem.py", line 498, in <module>\n    main()\n  File "/tools/SVE/src/hydra/scripts/forceOneClusterPerPairMem.py", line 485, in main\n    updatedFile       = chooseBestClusterForReads(readSortedFile, clusterSupport)\n  File "/tools/SVE/src/hydra/scripts/forceOneClusterPerPairMem.py", line 217, in chooseBestClusterForReads\n    updateMappings(clusters, mappings, clusterSupport, out)\n  File "/tools/SVE/src/hydra/scripts/forceOneClusterPerPairMem.py", line 180, in updateMappings\n    bestCluster = chooseBestClusterForRead(support)\n  File "/tools/SVE/src/hydra/scripts/forceOneClusterPerPairMem.py", line 168, in chooseBestClusterForRead\n    return distinct_support[0][0]\nIndexError: list index out of range\n', 'message': '', 'code': 1}
<<<<<<<<<<<<<hydra failure>>>>>>>>>>>>>>>

From the /tools/SVE/src/hydra/scripts/forceOneClusterPerPairMem.py file.

Do you have any insights as to why this fails ?

Dataset used for testing :

I used the GATK HaplotypeCaller workshop dataset since it is a small enough dataset to do quick testing, it is available here https://drive.google.com/drive/folders/0BzI1CyccGsZicXNqZWplU0d6Ync under data/GATK_Germline.zip

Prior to calling the reads were realigned with

root@d6c4c258dbce:/home/working# /tools/SVE/bin/sve realign -r data/ref/ref.fasta data/bams/mother.bam -o ./sandbox

From the output I can see that the $SHELL env variable is not set, which may be the cause of the first problem, it could be a good thing to set the $SHELL env variable to /bin/bash in the Dockerfile.

parallel: Warning: $SHELL not set. Using /bin/sh.
Done

<<<<<<<<<<<<<speedseq realign sucessfull>>>>>>>>>>>>>>>

Thank you for your help in getting Hydra running through SVE using the provided Docker image.

(Edit : fixed typo, brackers -> brackets)