WGLab / LinkedSV

MIT License
20 stars 8 forks source link

Traceback error in remove_redundantsv #8

Closed FrickTobias closed 5 years ago

FrickTobias commented 5 years ago

Problem

I am getting a ValueError: data must be 2 dimensions when running LinkedSV (see below for full error message & context).

Possible solution

I am running this on data which is not taken from a Longranger output but rather a custom pipeline and such it does not have all SAM tags one might expect in a Longranger output. From searching the GitHub directory I found multiple mentions of what seems like a HP tag, is this somthing LinkedSV requires?

If this is the solution, is there any other tag requirements I should be aware of that I would need?

10x haplotyping tags

Taken from the 10x Genomics homepage.

Tag Type Description
PC i Phred-scaled confidence that this read was phased correctly.
PS i Phase set containing this read. This corresponds to the phase set (PS) field in the VCF file. The value is the position of the first SNP in the phase block.
HP i Haplotype of the molecule that generated the read.
MI i Global molecule identifier for molecule that generated this read.

Explicit error

From row 343283 in stderr output:

[10/07/2019 22:38:50 (80.835 MB)] clustering discordant reads
[10/07/2019 22:38:50 (80.835 MB)] 19x-BLR-bulk-O1_2-filt.bam.bcd21.gz 
19x-BLR-bulk-O1_2-filt.bam.hap_depth.txt 
bowtie2/genome.fa.fai 100 20
[10/07/2019 22:39:39 (80.835 MB)] LinkedSV/scripts/../bin/call_small_deletions LinkedSV/19x-BLR-bulk-O1_2-filt.bam.hap_depth.txt LinkedSV/19x-BLR-bulk-O1_2-filt.bam.weird_reads.clusters.txt LinkedSV/19x-BLR-bulk-O1_2-filt.bam.bcd22 
bowtie2/genome.fa.fai LinkedSV/discordant_read_pairs.del.bedpe
Traceback (most recent call last):
  File "LinkedSV/linkedsv.py", line 316, in <module>
    main()
  File "LinkedSV/linkedsv.py", line 47, in main
    detect_increased_fragment_ends(args, dbo_args, endpoint_args)
  File "LinkedSV/linkedsv.py", line 194, in detect_increased_fragment_ends
    detect_small_deletions.detect_small_deletions(args.input_bam, args.out_dir, args.small_del_call_file, args.n_thread, args.ref_fa, args.fermikit_dir, args.samtools, args.bedtools, args.weird_reads_file, args.weird_reads_cluster_file, args.call_small_deletions_binary, args.cal_hap_read_depth_from_bcd21, endpoint_args.bcd21_file, endpoint_args.bcd22_file, args.hap_type_read_depth_file, args.gap_region_bed_file, rm_temp_files) 
  File "LinkedSV/scripts/detect_small_deletions.py", line 95, in detect_small_deletions
    merge_sv_calls(local_assembly_out_file, short_reads_del_call_file, out_del_call_file, tid2chrname_list, chrname2tid_dict)
  File "LinkedSV/scripts/detect_small_deletions.py", line 107, in merge_sv_calls
    local_assembly_del_call_list = svtk.remove_redundantsv(local_assembly_del_call_list)
  File "LinkedSV/scripts/svtk.py", line 136, in remove_redundantsv
    tree = cKDTree(coord_list, leafsize = 10000)
  File "ckdtree.pyx", line 525, in scipy.spatial.ckdtree.cKDTree.__init__
ValueError: data must be 2 dimensions
FrickTobias commented 5 years ago

Command

python linkedsv.py -r genome.fa -d LinkedSV/ -i 19x-BLR-bulk-O1_2-filt.bam -t 20
fangli80 commented 5 years ago

Hi, LinkedSV uses the HP tag to read haplotype information and the BX to read the barcode for a read/alignment. What kind of data do you have and are haplotype information included in the bam file? If yes, I can change the code so that users can specify a tag name for barcode and haplotype information.

Best, Li

FrickTobias commented 5 years ago

I can quite easily change a tag so just having them clearly listed is enough for me, thanks for the offer though. Maybe the required input format could be written somewhere, like "sorted BAM file with a BX and a HP tag as output from Longranger " if someone else also wants to expand on the intended use.

fangli80 commented 5 years ago

Ok. By the way, please make sure you use the lariat aligner (https://github.com/10XGenomics/lariat) to generate the bam file. The lariat aligner considers barcode and has a better mapping in regions where traditional short-read aligners perform badly. If you use bwa-mem or the other aligners, there may be false mapping issues, which cause false-positive SV calls.

FrickTobias commented 5 years ago

Thank you for the advice.