WGLab / LinkedSV

MIT License
20 stars 8 forks source link

node33.candidates file not generated #14

Open ricchasethi opened 4 years ago

ricchasethi commented 4 years ago

Hello,

I am using LinkedSV on 10X Genomics linked-reads sequencing generated bam file with following command: python linkedsv.py -i phased_possorted_bam.bam -d . -r hg38.fa -v hg38 -t 20 --germline_mode --save_temp_files --black_region_bed /black_lists/hg38_black_list.bed --gap_region_bed /black_lists/hg38_gap.bed

And there is no phased_possorted_bam.bam.node33.candidates file generated in the folder. The exact error is:

[04/23/2020 11:05:10 (119.964 MB)] searching for extremely high coverage region [04/23/2020 11:15:31 (133.612 MB)] calculating distribution parameters [04/23/2020 11:15:31 (133.612 MB)] total number of reads in the genome is: 322232329 [04/23/2020 11:15:31 (133.616 MB)] calculating fragment parameters from file: phased_possorted_bam.bam.bcd22 [04/23/2020 11:16:25 (306.758 MB)] N95_fragment_length is: 32524 [04/23/2020 11:17:37 (453.153 MB)] finished getting fragment parameters [04/23/2020 11:17:38 (452.964 MB)] searching for paired breakpoints [04/23/2020 11:17:38 (452.964 MB)] searching paired breakpoints [04/23/2020 11:17:38 (452.964 MB)] building nodes from fragments [04/23/2020 11:17:38 (452.964 MB)] reading bcd22 file: phased_possorted_bam.bam.bcd22 [04/23/2020 11:18:24 (6.147 GB)] total number of fragments: 4547447 [04/23/2020 11:18:26 (6.166 GB)] writing to node file [04/23/2020 11:21:15 (4.208 GB)] removing sparse nodes, min_support_fragments is 10 [04/23/2020 11:21:16 (4.208 GB)] clustering nodes, max distance for connecting two nodes is: 17684 [04/23/2020 11:21:16 (4.208 GB)] min support fragment pairs is: 10 [04/23/2020 11:21:16 (4.208 GB)] reading black region bed file [04/23/2020 11:21:16 (4.208 GB)] reading node candidate file: phased_possorted_bam.bam.node33.candidates Traceback (most recent call last): File "linkedsv.py", line 313, in main() File "linkedsv.py", line 47, in main detect_increased_fragment_ends(args, dbo_args, endpoint_args) File "linkedsv.py", line 221, in detect_increased_fragment_ends find_paired_bk.find_paired_bk(args, dbo_args, endpoint_args) File "/scripts/find_paired_bk.py", line 723, in find_paired_bk build_graph_from_fragments(args, dbo_args, endpoint_args) File "/scripts/find_paired_bk.py", line 196, in build_graph_from_fragments clustering_nodes(args, dbo_args, endpoint_args, args.node33_candidate_file, args.node_cluster33_file, max_gap_distance, 'R_end', 'R_end') File "/scripts/find_paired_bk.py", line 333, in clustering_nodes node_list = read_node_list_file(node_list_file, black_region_key_set, args.alt_tid_set) File "/scripts/find_paired_bk.py", line 231, in read_node_list_file node_list_fp = open(node_list_file, 'r') IOError: [Errno 2] No such file or directory: 'phased_possorted_bam.bam.node33.candidates'

Could you help me find the reason for the problem.

Thanks, Riccha

fangli80 commented 4 years ago

Hi Riccha, Thank you for using LinkedSV. Could you please show the intermediate output files and their sizes using the ls -l command?

Li

ricchasethi commented 4 years ago

Thanks for your reply @fangli08. I got following intermediate files:

-rw-r--r-- 1 10529 Apr 23 11:17 phased_possorted_bam.bam.arguments -rw-r--r-- 1 741287270 Apr 23 11:15 phased_possorted_bam.bam.barcode_cov.bed -rw-r--r-- 1 620 Apr 23 10:21 phased_possorted_bam.bam.barcode_statistics -rw-r--r-- 1 10225421667 Apr 23 10:21 phased_possorted_bam.bam.bcd21.gz -rw-r--r-- 1 4833788233 Apr 23 11:05 phased_possorted_bam.bam.bcd22 -rw-r--r-- 1 4529246833 Apr 23 10:55 phased_possorted_bam.bam.bcd22.tmp -rw-r--r-- 1 742 Apr 23 11:17 phased_possorted_bam.bam.fragment_statistics -rw-r--r-- 1 3498 Apr 23 11:15 phased_possorted_bam.bam.high_cov.bed -rw-r--r-- 1 657796053 Apr 23 10:45 phased_possorted_bam.bam.low_mapq.bcd21.gz -rw-r--r-- 1 414293838 Apr 23 11:21 phased_possorted_bam.bam.node33 -rw-r--r-- 1 414293833 Apr 23 11:21 phased_possorted_bam.bam.node35 -rw-r--r-- 1 414292748 Apr 23 11:21 phased_possorted_bam.bam.node53 -rw-r--r-- 1 414292743 Apr 23 11:21 phased_possorted_bam.bam.node55 -rw-r--r-- 1 18752877965 Apr 23 10:06 phased_possorted_bam.bam.sortbx.bam -rw-r--r-- 1 161807952 Apr 23 11:05 phased_possorted_bam.bam.weird_reads.txt

morispi commented 4 years ago

Hi,

Getting the same error on a small test dataset. Here is the intermediate output files list:

-rw-rw-r-- 1 morispi morispi 11193 sept. 23 11:40 HCC1143_chr1_209344657_chr13_108809430.bam.arguments -rw-rw-r-- 1 morispi morispi 57637995 sept. 23 11:40 HCC1143_chr1_209344657_chr13_108809430.bam.barcode_cov.bed -rw-rw-r-- 1 morispi morispi 601 sept. 23 11:39 HCC1143_chr1_209344657_chr13_108809430.bam.barcode_statistics -rw-rw-r-- 1 morispi morispi 81088467 sept. 23 11:39 HCC1143_chr1_209344657_chr13_108809430.bam.bcd21.gz -rw-rw-r-- 1 morispi morispi 52901807 sept. 23 11:40 HCC1143_chr1_209344657_chr13_108809430.bam.bcd22 -rw-rw-r-- 1 morispi morispi 49633556 sept. 23 11:40 HCC1143_chr1_209344657_chr13_108809430.bam.bcd22.tmp -rw-rw-r-- 1 morispi morispi 729 sept. 23 11:40 HCC1143_chr1_209344657_chr13_108809430.bam.fragment_statistics -rw-rw-r-- 1 morispi morispi 0 sept. 23 11:40 HCC1143_chr1_209344657_chr13_108809430.bam.high_cov.bed -rw-rw-r-- 1 morispi morispi 3510376 sept. 23 11:39 HCC1143_chr1_209344657_chr13_108809430.bam.low_mapq.bcd21.gz -rw-rw-r-- 1 morispi morispi 120523 sept. 23 11:40 HCC1143_chr1_209344657_chr13_108809430.bam.node33 -rw-rw-r-- 1 morispi morispi 2304810 sept. 23 11:40 HCC1143_chr1_209344657_chr13_108809430.bam.weird_reads.txt

Did you manage to find what was causing the problem?

Best, Pierre

fangli80 commented 4 years ago

Sorry for the late reply. Do you have the error information (e.g. stdout and stderr )?

morispi commented 4 years ago

Hi,

Thanks for your reply!

LInkedSV was run with the following command: python example.bam -d ResultsExample -r example_hg19.fasta -t 8 -v hg19 > STDOUT 2> STDERR

stdout was empty. You can however find stderr attached here stderr.log

Best, Pierre

fangli80 commented 4 years ago

@morispi Please clone the latest version. I'm not sure if the problem is solved. But if you run the latest version, it will help me identify the problem. It might due to an out-of-memory error. What's the memory size of your machine?

By the way, from the stderr information, I guess you were probably running on a bam file only including a very small region of interest. LinkedSV needs to do some parameter estimation based on the global distribution of reads and barcodes, so please extract a larger region (> 10 Mb).

Best, Li

morispi commented 4 years ago

HI,

Just cloned the latest version, and the error message related to node33.candidates indeed disappeared. Please find attached the new stderr.log. I don't believe it's a memory issue, LinkedSV consumes at most 900 MB and my machine has 32 GB.

I am indeed running on a bam file including a tiny region of interest. I'm actually installing multiple tools on a cluster, and running them on a small example dataset to make sure they run properly. I will attempt to run LinkedSV on a larger example, if you believe this might be the issue.

Thank you very much for the quick answers and for the quick fix!

Best, PIerre

fangli80 commented 4 years ago

Hello @morispi, Do you mind sharing two files to me? /home/morispi/StructuralVariants/LinkedSV/ResultsExample/HCC1143_chr1_209344657_chr13_108809430.bam.node33 and /home/morispi/StructuralVariants/grocsvs/grocsvs_example/short_hg19.fa.fai If you don't want to post them on this public site, you can email me fangli2718@gmail.com

Best, Li

milesandersonmn commented 2 years ago

Was there a solution found for this? I'm receiving the same error message seen in the last stderr.log except with a return value of 139.