asylvz / SVarp

Phased structural variant discovery in pangenomes
MIT License
30 stars 1 forks source link

Assembly of `svtigs` output nothing #5

Closed wjwei-handsome closed 1 month ago

wjwei-handsome commented 2 months ago

Hi @asylvz ! thank you for creating such a valuable tool!

However, I encountered some problems in the process of using it. The following are my specific steps:

  1. generate graph by Minigraph-Cactus to integrate hg38 & chm13 & new-assembly, and we chose the intermediate graph, which is the output of minigraph.
  2. PacBio HiFi long-reads alignments by minigraph -cx lr --vc ref.gfa reads.fa.gz > align.gaf
  3. run svarp: svarp -a align.gaf -g ref.gfa -f COLO829.T1.fa.gz -o svarp_out

But svarp didn't return success, and here is log file:

Using wtdbg2 for assembly...
No phase information provided (--phase). SVs will not be phased...

...hallo, merhaba, ola, salaam, hello!!! SVarp is running...

Parameters:
    Minimum read support: 5
    Minimum distance threshold: 100

    Minimum map ratio: 0.9
    Precise clipping (Graphaligner): 0.97
    Alignment score (Graphaligner): 5000

Input files:
    COLO829T1_grch38.t2t.COLO829N1.sv.gaf
    grch38.t2t.COLO829N1.sv.gfa
    COLO829.T1.fa.gz

Log folder:
    svarp_out/

Reading the GFA file
Reading the GAF file
--->execution time: 224.996sec.
--->5676320 primary mappings and 32517 insertion, 38672 deletion loci in the cigar
--->there are 487014 SV signal (85039 inter alignment and 401975 intra alignment)
--->there are 0 unmapped alignments

Merging nearby SV signals
--->51498 read clusters (putative svtigs) after merging and
--->5370 read clusters after filtering based on minimum read support

Assembly...
--->assembling reads using wtdbg2
--->1235 filtered (674 high, 561 low coverage read clusters and 0 low read support)
--->0 clusters cannot be assembled
--->there are 0 svtigs before final filtering
--->assembly execution time: 1563.82 sec.

Filtering svtigs
--->remapping svtigs onto the graph using Graphaligner
Error: Graphaligner did not run successfully...
--->Command: GraphAligner -g grch38.t2t.COLO829N1.sv.gfa -f svarp_out/sample_svtigs_tmp.fa -a svarp_out/sample_remap.gaf -t 32 -x vg --precise-clipping 0.970000 --min-alignment-score 5000 --multimap-score-fraction 0.9 > /dev/null 2>&1

I noticed that the Assembly of svtigs seems to output nothing:

❯ tree -s
.
├── [      90284]  sample.log
├── [          0]  sample_remap.gaf
└── [          0]  sample_svtigs_tmp.fa

0 directories, 3 files

Therefore, I want to know if there are some deviations in my steps, if so, please let me know how I can modify it :)

Best wishes, Wei

asylvz commented 2 months ago

Hi, is wtdbg2 in your path, I mean can you run wtdbg2.pl? Seems that the assembly does not work

wjwei-handsome commented 2 months ago

I think I may have found the problem:

Actually, wtdbg.pl is in PATH, but:

line 29 in wtdbg2.pl:

$opts{mm2} = gwhich("minimap2") || die;

I will try to run again after preparing the minimap2, thank you for your quick reply :)

wjwei-handsome commented 2 months ago

Sorry, I encountered another problem. The log shows the following:

Assembly...
--->assembling reads using wtdbg2
--->1235 filtered (674 high, 561 low coverage read clusters and 0 low read support)
--->360 clusters cannot be assembled
--->there are 3847 svtigs before final filtering
--->assembly execution time: 10244.9 sec.

Filtering svtigs
--->remapping svtigs onto the graph using Graphaligner
--->reading remappings from svarp_out/sample_remap.gaf
/var/spool/slurmd/job4869760/slurm_script: line 4: 3068679 Illegal instruction     (core dumped) svarp -a COLO829T1_grch38.t2t.COLO829N1.sv.gaf -g grch38.t2t.COLO829N1.sv.gfa -f COLO829.T1.fa.gz -o svarp_out

output dir looks like:

❯ ll -t
total 82M
-rw-r--r-- 1 wjwei9908_gmail yangjian 133K Jul 21 22:16 sample_svtigs_tmp.fa.fai
-rw-r--r-- 1 wjwei9908_gmail yangjian 1.4M Jul 21 22:16 sample_remap.gaf
-rw-r--r-- 1 wjwei9908_gmail yangjian  81M Jul 21 22:06 sample_svtigs_tmp.fa
-rw-r--r-- 1 wjwei9908_gmail yangjian  89K Jul 21 19:15 sample.log

sample_remap.gaf ,this file does not seem to be truncated:

 ❯ tail -n2 sample_remap.gaf
s9596_18171274  44139   0   44139   +   >s9596  18196943    18143748    18187887    44122   44141   60  NM:i:19 AS:f:43505.7    dv:f:0.000430439    id:f:0.99957    cg:Z:28233=1X3794=1I811=1X543=1X345=1X1607=1X689=1X599=1X108=1X520=1X318=1D2043=1X2136=1D233=1X159=1X268=1X160=1X265=1I47=1X1244=
s8177_49375 41542   18739   24229   +   <s23881 2169418 2154756 2160246 5489    5490    60  NM:i:1  AS:f:5456.67    dv:f:0.000182149    id:f:0.999818   cg:Z:5489=1X

This makes me confused, looking forward to your reply :)

Best wishes, Wei

asylvz commented 2 months ago

You should have WFA2-lib. I guess that's the problem:

wget https://github.com/smarco/WFA2-lib/archive/refs/tags/v2.3.4.tar.gz --strip-components=1 mkdir wfa && tar -xzf v2.3.4.tar.gz -C wfa cd wfa && make clean all (wfa folder needs to reside inside SVarp's main folder)

wjwei-handsome commented 2 months ago

Ah-oh...

I only kept the executable file after compilation, will this affect?

asylvz commented 2 months ago

I use it as a library in SVarp, so please have a wfa folder inside svarp folder and keep all the files of WFA2-lib inside it.