Open MaestSi opened 6 years ago
I am not exactly sure what is the first set of empty files that aren't being written by your description. It seems like it is the router step, but I am more likely to believe it is failing at the extracting discordants step. Files won't get routed or assembled properly if the proceeding files are not written, empty, truncated, or contain some kind of catastrophic failure. Do all of the discordant bams get written? Does routed-files.txt get written? Do any of the discordant cluster files get written?
I think there have been changes to samtools in the past few years that could be causing this depending on what version you have of samtools, but this would be a problem at discordant extraction. If you could tell me what exact step is failing and what fails to write, then I might be able to give you a better answer.
I have the following non-empty files:
Okay thanks. That is helpful. Did you set the ulimit to over 16000 or 16384? This controls the number of file handles that can be open simultaneously. How many of those chr chr +- files are written?
Have you tried running the routing step independently? What happens?
1020 chr chr +- files are written in total, some of them are empty but the majority is not. Yes, I already set ulimit -f to 16384 and tried running the routing step independently. In particular these are the command and the messages printed to screen:
/mnt/cifs01/simone/software/SVE/src/hydra/bin/hydra-router -config /mnt/cifs01/simone/NA12878/Hydra_output/start_sorted_S17/bam.stub.config -routedList /mnt/cifs01/simone/NA12878/Hydra_output/start_sorted_S17/bam.routed
Parameters: Configuration file (-config): /mnt/cifs01/simone/NA12878/Hydra_output/start_sorted_S17/bam.stub.config Routed file list (-routedList): /mnt/cifs01/simone/NA12878/Hydra_output/start_sorted_S17/bam.routed
Processing: Routing discordant mappings to master chrom/chrom/strand/strand files. Found /mnt/cifs01/simone/NA12878/start_sorted.bam.bedpe Routing mappings from: /mnt/cifs01/simone/NA12878/start_sorted.bam.bedpe...Time elapsed: 54 sec
Are you running this on a single dataset? I think there should be many bedpes.
Yes, I'm running Hydra on a single sample. I had only one ~1GB bedpe file.
I don't remember if hydra-multi works on a single sample, I would recommend using Lumpy https://github.com/arq5x/lumpy-sv if you are using a single sample
I am running them both, together with other 5 softwares (potentially) in the framework of SVE (https://github.com/TheJacksonLaboratory/SVE). If you tell me that is doesn't work with a single sample that's fine. Thanks.
My intuition is that it should still work on a single sample, while not the intended use — so I am really not sure. Lumpy should really be used instead on a single sample. It might still be a problem with ulimit try -n rather than -f, sorry I can’t be of more help.
I think it will fail silently on a chr/chr/strand/strand encountered in the routed list that didn't get written which is why I think that it is a ulimit problem... not sure.
Just one more question. Does Hydra support hg38 reference? Because the BAM file I'm using has been obtained mapping fastq reads to hg38. Could that be the reason for the issue I am facing? Thank you
It should and there isn’t any reason why it wouldn’t. So long as all of the samples are aligned to the same genome.
Dear Hydra developers, after making hydra configuration (make_hydra_config.py) and extracting discordants for sample0 (extract_discordants.py), I run command for routing all samples into hydra router (hydra-router). This command doesn't give any error, however, I noticed that output file bam.routed is empty. After that, I run commands for combining hydra assembly files (assemble-routed-files.sh) and for merging results (combine-assembled-files.sh). When forceOneClusterPerPairMem.py script for starting hydra clustering is invoked, however, it gives the following error: call error: Traceback (most recent call last): File "/mnt/cifs01/simone/software/SVE/src/hydra/scripts/forceOneClusterPerPairMem.py", line 498, in
main()
File "/mnt/cifs01/simone/software/SVE/src/hydra/scripts/forceOneClusterPerPairMem.py", line 483, in main
clusterSupport = computeSupportForEachCluster(opts.master, opts.maxDist)
File "/mnt/cifs01/simone/software/SVE/src/hydra/scripts/forceOneClusterPerPairMem.py", line 120, in computeSupportForEachCluster
for line in open(clusterFile, 'r'):
IOError: [Errno 2] No such file or directory: '/mnt/cifs01/simone/NA12878/Hydra_output/start_sorted_S17/all.assembled'
I can confirm that file all.assembled has not been created. How could I solve this issue? Thanks in advance.