anialisiecka / ALIBI

MIT License
7 stars 0 forks source link

Applying ALIBI does produce the exact same graph #1

Closed subwaystation closed 2 years ago

subwaystation commented 2 years ago

Hi there @anialisiecka :)

I am applying ALIBI to a DRB1-3123 pangenome graph which was build with PGGB. I am taking the seqwish output of PGGB, as it presents the raw, unlinearized graph. It looks like this: DRB1-3123 fa 15a1009 2ff309f seqwish gfa

odgi viz -i DRB1-3123.fa.15a1009.2ff309f.seqwish.gfa -o DRB1-3123.fa.15a1009.2ff309f.seqwish.gfa.png

Then I apply ALIBI:

bash alibi.sh -i ~/Downloads/TEST_ALIBI/DRB1-3123.fa.15a1009.2ff309f.seqwish.gfa

Which yields the exact same graph: DRB1-3123 fa 15a1009 2ff309f seqwish_sorted gfa

odgi viz -i DRB1-3123.fa.15a1009.2ff309f.seqwish_sorted.gfa -o DRB1-3123.fa.15a1009.2ff309f.seqwish_sorted.gfa.png

Am I doing something wrong? Here the graph: DRB1-3123.fa.15a1009.2ff309f.seqwish.gfa.zip

How to read the visualization is explained in https://odgi.readthedocs.io/en/latest/rst/tutorials/exploratory_analysis.html#visualize-the-drb1-3123-graph.

Thanks for any feedback!

AndreaGuarracino commented 2 years ago

It seems ALIBI only changes the node order in the GFA, without updating the graph itself:

Before:

H   VN:Z:1.0
S   1   AT
S   2   TTAACTCCATC
S   3   TTTGAGAAACATTTAATAATGTAATGTGTTTGT
S   4   CATACAGGGTGAATACAGATGCACGGGAGGCCATAC
S   5   GGTTTAGGCAAAGGGGAGCACAAAAGTTGAAGATGAGGC
S   6   GCTGCC
S   7   AT
S   8   CAATGCTGGGACTTCAGGCCAA
S   9   GGG
S   10  CAGGAGCTGAGGAAGCCACAAGGGAGGACATTTTCTGCAGTTGC
...
P   gi|568815592:32578768-32589835  1+,2+,3+,4+,5+,6+,7+,8+,9+,10+,11+,12+,13+,14+,15+,16+,17+,18+,19+,20+,21+,22+,23+,24+,25+,26+,27+,28+,29+,30+,31+,32+,33+,34+,35+,36+,37+,38+,39+,40+,41+,42+,43+,44+,45+,46+,47+,48+,49+,50+,51+,52+,53+,54+,55+,56+,57+,58+,59+,60+,61+,62+,63+,64+,65+,66+,67+,68+,69+,70+,71+,72+,73+,74+,75+,76+,77+,78+,79+,80+,81+,82+,83+,84+,85+,86+,87+,88+,89+,90+,91+,92+,93+,94+,95+,96+,97+,98+,99+,100+,101+,102+,103+,104+,105+,106+,107+,108+,109+,110+,111+,112+,113+,114+,115+,116+,117+,118+,119+,120+,121+,122+,123+,124+,125+,126+,127+,128+,129+,130+,131+,132+,133+,134+,135+,136+,137+,138+,139+,140+,141+,142+,143+,144+,145+,146+,147+,148+,149+,150+,151+,152+,153+,154+,155+,156+,157+,158+,159+,160+,161+,162+,163+,164+,165+,166+,167+,168+,169+,170+,171+,172+,173+,174+,175+,176+,177+,178+,179+,180+,181+,182+,183+,184+,185+,186+,187+,188+,189+,190+,191+,192+,193+,194+,195+,196+,197+,198+,199+,200+,201+,202+,203+,204+,205+,206+,207+,208+,209+,210+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,213+,214+,215+,216+,217+,218+,219+,220+,221+,222+,223+,224+,225+,226+,227+,228+,229+,230+,231+,232+,233+,234+,235+,236+,237+,238+,239+,240+,241+,242+,243+,244+,245+,246+,247+,248+,249+,250+,251+,252+,253+,254+,255+,256+,257+,258+,259+,260+,261+,262+,263+,264+,265+,266+,267+,268+,269+,270+,271+,272+,273+,274+,275+,276+,277+,278+,279+,280+,281+,282+,283+,284+,285+,286+,287+,288+,289+,290+,291+,292+,293+,294+,295+,296+,297+,298+,299+,300+,301+,302+,303+,304+,305+,306+,307+,308+,309+,310+,311+,312+,313+,314+,315+,316+,317+,318+,319+,320+,321+,322+,323+,324+,325+,326+,327+,328+,329+,330+,331+,332+,333+,334+,335+,336+,337+,338+,339+,340+,341+,342+,343+,344+,345+,346+,347+,348+,349+,350+,351+,352+,353+,354+,355+,356+,357+,358+,359+,360+,361+,362+,363+,364+,365+,366+,367+,368+,369+,370+,371+,372+,373+,374+,375+,376+,377+,378+,379+,380+,381+,382+,383+,384+,385+,386+,387+,388+,389+,390+,391+,392+,393+,394+,395+,396+,397+,398+,399+,400+,401+,402+,403+,404+,405+,406+,407+,408+,409+,410+,411+,412+,413+,414+,415+,416+,417+,418+,419+,420+,421+,422+,423+,424+,425+,426+,427+,428+,429+,430+,431+,432+,433+,434+,435+,436+,437+,438+,439+,440+,441+,442+,443+,444+,445+,446+,447+,448+,449+,450+,451+,452+,453+,454+,455+,456+,457+,458+,459+,460+,461+,462+,463+,464+,465+,466+,467+,468+,469+,470+,471+,472+,473+,474+,475+,476+,477+,478+,479+,480+,481+,482+,483+,484+,485+,486+,487+,488+,489+,490+    *
P   gi|568815529:3998044-4011446    1+,2+,3+,491+,5+,492+,10+,493+,12+,494+,14+,495+,496+,16+,497+,498+,499+,18+,19+,20+,21+,500+,23+,501+,26+,502+,28+,29+,503+,31+,32+,504+,35+,36+,37+,38+,505+,506+,41+,507+,44+,45+,508+,47+,509+,49+,50+,51+,52+,53+,54+,510+,56+,57+,58+,59+,60+,61+,511+,512+,64+,65+,66+,67+,513+,69+,70+,71+,72+,514+,515+,516+,517+,74+,75+,518+,78+,79+,519+,520+,521+,82+,83+,84+,522+,523+,524+,525+,86+,87+,88+,89+,90+,91+,526+,93+,527+,95+,528+,529+,530+,531+,532+,97+,98+,99+,533+,101+,102+,103+,534+,107+,535+,536+,537+,538+,539+,112+,113+,114+,540+,541+,542+,543+,116+,117+,118+,119+,544+,121+,545+,125+,546+,127+,547+,133+,134+,135+,136+,548+,549+,138+,139+,140+,141+,550+,551+,552+,143+,553+,554+,145+,146+,147+,148+,555+,150+,556+,152+,557+,155+,156+,157+,558+,159+,160+,559+,162+,560+,164+,561+,562+,563+,166+,167+,564+,565+,566+,171+,172+,567+,568+,174+,569+,176+,570+,178+,179+,180+,571+,572+,184+,185+,186+,187+,573+,574+,189+,190+,191+,192+,575+,576+,194+,195+,577+,197+,198+,199+,578+,579+,202+,203+,580+,581+,205+,582+,583+,584+,585+,586+,587+,588+,207+,589+,590+,210+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,213+,214+,215+,591+,592+,217+,218+,219+,220+,593+,223+,594+,595+,225+,226+,227+,228+,596+,597+,231+,232+,233+,598+,236+,237+,238+,239+,599+,600+,601+,602+,243+,244+,245+,603+,604+,605+,247+,606+,249+,250+,251+,607+,608+,609+,610+,611+,612+,613+,614+,615+,616+,617+,618+,619+,620+,621+,622+,623+,624+,625+,626+,627+,628+,629+,630+,631+,632+,633+,634+,635+,636+,637+,638+,639+,640+,641+,642+,643+,644+,645+,646+,647+,648+,649+,650+,651+,652+,653+,654+,655+,656+,657+,658+,659+,660+,661+,662+,663+,664+,253+,665+,255+,256+,666+,258+,259+,667+,668+,261+,669+,670+,263+,2S   1039    C
...

After:

H   VN:Z:1.0
S   781 ATTTTTAACTCCATG
S   1   AT
S   2   TTAACTCCATC
S   3   TTTGAGAAACATTTAATAATGTAATGTGTTTGT
S   491 GGTACAGGGTGAGTACAGATGCACAGGAGGCCATAG
S   4   CATACAGGGTGAATACAGATGCACGGGAGGCCATAC
S   5   GGTTTAGGCAAAGGGGAGCACAAAAGTTGAAGATGAGGC
S   492 ACTGCCATCAAAGCTGTGGGGCTTCAGGCCAAGAA
S   782 GGCACAG
S   783 G
...
P   gi|568815592:32578768-32589835  1+,2+,3+,4+,5+,6+,7+,8+,9+,10+,11+,12+,13+,14+,15+,16+,17+,18+,19+,20+,21+,22+,23+,24+,25+,26+,27+,28+,29+,30+,31+,32+,33+,34+,35+,36+,37+,38+,39+,40+,41+,42+,43+,44+,45+,46+,47+,48+,49+,50+,51+,52+,53+,54+,55+,56+,57+,58+,59+,60+,61+,62+,63+,64+,65+,66+,67+,68+,69+,70+,71+,72+,73+,74+,75+,76+,77+,78+,79+,80+,81+,82+,83+,84+,85+,86+,87+,88+,89+,90+,91+,92+,93+,94+,95+,96+,97+,98+,99+,100+,101+,102+,103+,104+,105+,106+,107+,108+,109+,110+,111+,112+,113+,114+,115+,116+,117+,118+,119+,120+,121+,122+,123+,124+,125+,126+,127+,128+,129+,130+,131+,132+,133+,134+,135+,136+,137+,138+,139+,140+,141+,142+,143+,144+,145+,146+,147+,148+,149+,150+,151+,152+,153+,154+,155+,156+,157+,158+,159+,160+,161+,162+,163+,164+,165+,166+,167+,168+,169+,170+,171+,172+,173+,174+,175+,176+,177+,178+,179+,180+,181+,182+,183+,184+,185+,186+,187+,188+,189+,190+,191+,192+,193+,194+,195+,196+,197+,198+,199+,200+,201+,202+,203+,204+,205+,206+,207+,208+,209+,210+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,213+,214+,215+,216+,217+,218+,219+,220+,221+,222+,223+,224+,225+,226+,227+,228+,229+,230+,231+,232+,233+,234+,235+,236+,237+,238+,239+,240+,241+,242+,243+,244+,245+,246+,247+,248+,249+,250+,251+,252+,253+,254+,255+,256+,257+,258+,259+,260+,261+,262+,263+,264+,265+,266+,267+,268+,269+,270+,271+,272+,273+,274+,275+,276+,277+,278+,279+,280+,281+,282+,283+,284+,285+,286+,287+,288+,289+,290+,291+,292+,293+,294+,295+,296+,297+,298+,299+,300+,301+,302+,303+,304+,305+,306+,307+,308+,309+,310+,311+,312+,313+,314+,315+,316+,317+,318+,319+,320+,321+,322+,323+,324+,325+,326+,327+,328+,329+,330+,331+,332+,333+,334+,335+,336+,337+,338+,339+,340+,341+,342+,343+,344+,345+,346+,347+,348+,349+,350+,351+,352+,353+,354+,355+,356+,357+,358+,359+,360+,361+,362+,363+,364+,365+,366+,367+,368+,369+,370+,371+,372+,373+,374+,375+,376+,377+,378+,379+,380+,381+,382+,383+,384+,385+,386+,387+,388+,389+,390+,391+,392+,393+,394+,395+,396+,397+,398+,399+,400+,401+,402+,403+,404+,405+,406+,407+,408+,409+,410+,411+,412+,413+,414+,415+,416+,417+,418+,419+,420+,421+,422+,423+,424+,425+,426+,427+,428+,429+,430+,431+,432+,433+,434+,435+,436+,437+,438+,439+,440+,441+,442+,443+,444+,445+,446+,447+,448+,449+,450+,451+,452+,453+,454+,455+,456+,457+,458+,459+,460+,461+,462+,463+,464+,465+,466+,467+,468+,469+,470+,471+,472+,473+,474+,475+,476+,477+,478+,479+,480+,481+,482+,483+,484+,485+,486+,487+,488+,489+,490+    *
P   gi|568815529:3998044-4011446    1+,2+,3+,491+,5+,492+,10+,493+,12+,494+,14+,495+,496+,16+,497+,498+,499+,18+,19+,20+,21+,500+,23+,501+,26+,502+,28+,29+,503+,31+,32+,504+,35+,36+,37+,38+,505+,506+,41+,507+,44+,45+,508+,47+,509+,49+,50+,51+,52+,53+,54+,510+,56+,57+,58+,59+,60+,61+,511+,512+,64+,65+,66+,67+,513+,69+,70+,71+,72+,514+,515+,516+,517+,74+,75+,518+,78+,79+,519+,520+,521+,82+,83+,84+,522+,523+,524+,525+,86+,87+,88+,89+,90+,91+,526+,93+,527+,95+,528+,529+,530+,531+,532+,97+,98+,99+,533+,101+,102+,103+,534+,107+,535+,536+,537+,538+,539+,112+,113+,114+,540+,541+,542+,543+,116+,117+,118+,119+,544+,121+,545+,125+,546+,127+,547+,133+,134+,135+,136+,548+,549+,138+,139+,140+,141+,550+,551+,552+,143+,553+,554+,145+,146+,147+,148+,555+,150+,556+,152+,557+,155+,156+,157+,558+,159+,160+,559+,162+,560+,164+,561+,562+,563+,166+,167+,564+,565+,566+,171+,172+,567+,568+,174+,569+,176+,570+,178+,179+,180+,571+,572+,184+,185+,186+,187+,573+,574+,189+,190+,191+,192+,575+,576+,194+,195+,577+,197+,198+,199+,578+,579+,202+,203+,580+,581+,205+,582+,583+,584+,585+,586+,587+,588+,207+,589+,590+,210+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,212+,211+,213+,214+,215+,591+,592+,217+,218+,219+,220+,593+,223+,594+,595+,225+,226+,227+,228+,596+,597+,231+,232+,233+,598+,236+,237+,238+,239+,599+,600+,601+,602+,243+,244+,245+,603+,604+,605+,247+,606+,249+,250+,251+,607+,608+,609+,610+,611+,612+,613+,614+,615+,616+,617+,618+,619+,620+,621+,622+,623+,624+,625+,626+,627+,628+,629+,630+,631+,632+,633+,634+,635+,636+,637+,638+,639+,640+,641+,642+,643+,644+,645+,646+,647+,648+,649+,650+,651+,652+,653+,654+,655+,656+,657+,658+,659+,660+,661+,662+,663+,664+,253+,665+,255+,256+,666+,258+,259+,667+,668+,261+,669+,670+,263+,264+,671+,266+,267+,268+,269+,270+,672+,673+,674+,272+,675+,276+,676+,677+,678+,280+,679+,282+,283+,284+,680+,286+,287+,288+,681+,682+,683+,684+,291+,292+,293+,685+,686+,295+,687+,297+,298+,299+,688+,301+,689+,690+,691+,303+,304+,692+,693+,694+,695+,696+,307+,308+,697+,698+,699+,310+,700+,312+,313+,314+,701+,702+,317+,318+,319+,703+,321+,704+,323+,324+,325+,705+,328+,706+,330+,331+,707+,333+,708+,709+,710+,711+,335+,336+,712+,338+,339+,340+,341+,713+,343+,344+,345+,346+,714+,348+,349+,715+,351+,716+,353+,354+,717+,718+,357+,358+,719+,360+,720+,363+,364+,721+,374+,722+,723+,724+,725+,726+,727+,728+,729+,730+,376+,377+,731+,732+,733+,379+,734+,382+,383+,384+,735+,736+,386+,387+,388+,737+,390+,391+,392+,393+,738+,396+,397+,739+,399+,740+,741+,402+,403+,742+,743+,744+,406+,407+,408+,409+,410+,411+,745+,414+,746+,416+,417+,747+,419+,420+,748+,422+,423+,749+,750+,751+,752+,753+,425+,426+,427+,428+,429+,754+,431+,432+,433+,434+,435+,755+,437+,756+,439+,757+,441+,442+,443+,758+,445+,446+,759+,760+,761+,448+,449+,450+,762+,452+,453+,763+,456+,764+,765+,458+,459+,460+,766+,463+,464+,465+,767+,768+,769+,469+,470+,471+,770+,771+,772+,773+,774+,474+,475+,476+,477+,478+,479+,775+,776+,483+,777+,485+,778+,779+,488+,780+,490+    *

This would explain the exact same plots.

subwaystation commented 2 years ago

Ah, so maybe ALIBI is missing to update the node identifiers and the steps in the paths.

anialisiecka commented 2 years ago

Hi,

ALIBI does not modify node identifiers. It changes the node order in the gfa file, but node identifiers remain unchanged. The new order is specified by the order of 'S' lines in the sorted gfa file.

subwaystation commented 2 years ago

Hi @anialisiecka, thanks for the clarification. When ODGI is sorting its nodes, it is updating their node identifiers, then the edges and paths of the graph. That's why it didn't work here. However, odgi sort has the -s,--sort-order option where we were able to update our graph with the sort from ALIBI's GFA. @AndreaGuarracino

I think your way of handling the new node order can be confusing for programs. At least for the ones I worked with so far. They would not care about how the nodes are ordered in the file, but about the node identifier.