arq5x / Hydra

19 stars 11 forks source link

bam.routed file is empty #17

Open MaestSi opened 6 years ago

MaestSi commented 6 years ago

Dear Hydra developers, after making hydra configuration (make_hydra_config.py) and extracting discordants for sample0 (extract_discordants.py), I run command for routing all samples into hydra router (hydra-router). This command doesn't give any error, however, I noticed that output file bam.routed is empty. After that, I run commands for combining hydra assembly files (assemble-routed-files.sh) and for merging results (combine-assembled-files.sh). When forceOneClusterPerPairMem.py script for starting hydra clustering is invoked, however, it gives the following error: call error: Traceback (most recent call last): File "/mnt/cifs01/simone/software/SVE/src/hydra/scripts/forceOneClusterPerPairMem.py", line 498, in main() File "/mnt/cifs01/simone/software/SVE/src/hydra/scripts/forceOneClusterPerPairMem.py", line 483, in main clusterSupport = computeSupportForEachCluster(opts.master, opts.maxDist) File "/mnt/cifs01/simone/software/SVE/src/hydra/scripts/forceOneClusterPerPairMem.py", line 120, in computeSupportForEachCluster for line in open(clusterFile, 'r'): IOError: [Errno 2] No such file or directory: '/mnt/cifs01/simone/NA12878/Hydra_output/start_sorted_S17/all.assembled'

I can confirm that file all.assembled has not been created. How could I solve this issue? Thanks in advance.

ml4wc commented 6 years ago

I am not exactly sure what is the first set of empty files that aren't being written by your description. It seems like it is the router step, but I am more likely to believe it is failing at the extracting discordants step. Files won't get routed or assembled properly if the proceeding files are not written, empty, truncated, or contain some kind of catastrophic failure. Do all of the discordant bams get written? Does routed-files.txt get written? Do any of the discordant cluster files get written?

I think there have been changes to samtools in the past few years that could be causing this depending on what version you have of samtools, but this would be a problem at discordant extraction. If you could tell me what exact step is failing and what fails to write, then I might be able to give you a better answer.

MaestSi commented 6 years ago

I have the following non-empty files:

ml4wc commented 6 years ago

Okay thanks. That is helpful. Did you set the ulimit to over 16000 or 16384? This controls the number of file handles that can be open simultaneously. How many of those chr chr +- files are written?

Have you tried running the routing step independently? What happens?

MaestSi commented 6 years ago

1020 chr chr +- files are written in total, some of them are empty but the majority is not. Yes, I already set ulimit -f to 16384 and tried running the routing step independently. In particular these are the command and the messages printed to screen:

/mnt/cifs01/simone/software/SVE/src/hydra/bin/hydra-router -config /mnt/cifs01/simone/NA12878/Hydra_output/start_sorted_S17/bam.stub.config -routedList /mnt/cifs01/simone/NA12878/Hydra_output/start_sorted_S17/bam.routed

Parameters: Configuration file (-config): /mnt/cifs01/simone/NA12878/Hydra_output/start_sorted_S17/bam.stub.config Routed file list (-routedList): /mnt/cifs01/simone/NA12878/Hydra_output/start_sorted_S17/bam.routed

Processing: Routing discordant mappings to master chrom/chrom/strand/strand files. Found /mnt/cifs01/simone/NA12878/start_sorted.bam.bedpe Routing mappings from: /mnt/cifs01/simone/NA12878/start_sorted.bam.bedpe...Time elapsed: 54 sec

ml4wc commented 6 years ago

Are you running this on a single dataset? I think there should be many bedpes.

MaestSi commented 6 years ago

Yes, I'm running Hydra on a single sample. I had only one ~1GB bedpe file.

ml4wc commented 6 years ago

I don't remember if hydra-multi works on a single sample, I would recommend using Lumpy https://github.com/arq5x/lumpy-sv if you are using a single sample

MaestSi commented 6 years ago

I am running them both, together with other 5 softwares (potentially) in the framework of SVE (https://github.com/TheJacksonLaboratory/SVE). If you tell me that is doesn't work with a single sample that's fine. Thanks.

ml4wc commented 6 years ago

My intuition is that it should still work on a single sample, while not the intended use — so I am really not sure. Lumpy should really be used instead on a single sample. It might still be a problem with ulimit try -n rather than -f, sorry I can’t be of more help.

ml4wc commented 6 years ago

I think it will fail silently on a chr/chr/strand/strand encountered in the routed list that didn't get written which is why I think that it is a ulimit problem... not sure.

MaestSi commented 6 years ago

Just one more question. Does Hydra support hg38 reference? Because the BAM file I'm using has been obtained mapping fastq reads to hg38. Could that be the reason for the issue I am facing? Thank you

ml4wc commented 6 years ago

It should and there isn’t any reason why it wouldn’t. So long as all of the samples are aligned to the same genome.