BGI-Qingdao / TGS-GapCloser

A gap-closing software tool that uses long reads to enhance genome assembly.
GNU General Public License v3.0
179 stars 13 forks source link

Assertion `items.size() == 3' #66

Closed Johnsonzcode closed 1 year ago

Johnsonzcode commented 1 year ago

Dear @cchd0001

TGS-GapCloser is a powerful tool in gap filling field. Manys times successful experience but not this time. I use HiFi reads to fill gap.

~/software/TGS-GapCloser/TGS-GapCloser.sh  \
        --scaff  $1 \
        --reads  $2 \
        --output $4 \
        --ne \
        --tgstype pb \
        --thread $t \
        --minmap_arg '-x asm20'

minimap2 version is 2.17-r941.

prefix.fill.log:

/storage-01/software/TGS-GapCloser/bin/TGSGapCloser --prefix hifi_reads_fill_jiaji --contig2ont_paf hifi_reads_fill_jiaji.fill.paf --ont_reads_a fastp.out.jiaji_2cell.fastq.gz.ec.fa --min_match 200 --min_idy 0.200000
TGSGapCloser    INFO    CST     2023/7/5        8:54:0  :       TGSGapCloser start now ...
TGSGapCloser    INFO    CST     2023/7/5        8:54:9  :       LoadONTReads start now ...
TGSGapCloser    INFO    CST     2023/7/5        8:59:43 :       LoadONTReads finish. used wall clock : 334 seconds, cpu time : 258.390015 seconds
TGSGapCloser    INFO    CST     2023/7/5        8:59:43 :       LoadPAF start now ...
TGSGapCloser: ../biocommon/align_common/align_result.cpp:134: void BGIQD::ALIGN_COMMON::ExtraInfo::InitFromStr(const string&): Assertion `items.size() == 3' failed.

paf file content:

1       4698546 1336080 1373945 -       m64064_220905_081023/2753440/ccs        37876   4       37869   37853   37865   60      tp:A:P  cm:i:6864       s1:i:37853      s2:i:30597
      dv:f:0  rl:i:151589
1       4698546 2077769 2114468 +       m64064_220905_081023/15926004/ccs       36709   6       36705   36439   36699   49      tp:A:P  cm:i:6554       s1:i:36439      s2:i:32011
      dv:f:0  rl:i:151589
1       4698546 4527667 4564642 +       m64064_220905_081023/4719817/ccs        36980   0       36975   36358   36975   60      tp:A:P  cm:i:6484       s1:i:36358      s2:i:29078
      dv:f:0  rl:i:151589

I think it is about the paf format problem(like last four columns or total column number).

Best Johnsonz

cchd0001 commented 1 year ago

Hi Johnsonz,

The ExtraInfo parse the field like tp:A:P cm:i:6864 s1:i:37853 s2:i:30597 from the 13th column to the end of the line.
It separates the field by : and checks whether it contains three parts or not.

I check your example PAF but there is no invalid field. Could you please check through your PAF file and find the exception lines?

Best wishes Lidong Guo

Johnsonzcode commented 1 year ago

How to check? My paf file is huge(156Gb).

cchd0001 commented 1 year ago

How about this: awk '{if (NF>12) {for(i=13;i<NF+1;i++) { split($i,a,":"); if(length(a)!=3) print $0;} } }' xxx.paf

PS: 156Gb paf may cost huge memory, good luck to you.

Johnsonzcode commented 1 year ago

How about use this to filter paf first ? awk '$4-$3>1000{print $0}' paf > newpaf

Johnsonzcode commented 1 year ago

Solved, use latest minimap2 and -x map-hifi.