ekg / seqwish

alignment to variation graph inducer
MIT License
143 stars 18 forks source link

Graph to Graph Alignment #108

Open 8banzhuan opened 1 year ago

8banzhuan commented 1 year ago

Hi,I saw in your other project that you mentioned that by decomposing the GFA graph into multiple FASTA and PAF and inputting the data into Seqwish to achieve graph-to-graph comparison https://github.com/ekg/gimbricate I want to know how this should be done specifically, or what is the result of the comparison, whether it is the comparison between sequences in nodes or the topology between two graphs. I am very curious about this. I am A student in this field, I would be very grateful if I could get an answer from you.

ekg commented 1 year ago

The starting idea is that any existing variation graph can be represented as a collection of pairwise alignments and sequences. This should be able to reconstruct the variation graph if it were fed back into seqwish.

Now we have two or more graphs. We convert each into pariwise alignments and sequences. We then use alignments between the sequences in each graph to to execute the graph to graph alignment. To then construct the resulting graph produced by the alignment we combine all sequences and alignments from the decomposed graphs and the new graph-graph alignment and run seqwish on it.

8banzhuan commented 1 year ago

Thank you very much for your timely reply, you are my teacher! Can I understand it this way: I have two GFA files like a.GFA and b.GFA, now I decompose it into a.fasta and b.fasta and then perform sequence alignment between a.fasta and b.fasta,Then I will get ab.paf, I think it should be possible to restore another fasta file through the ab.paf file and any one of the fasta files (similar to the compression algorithm based on comparison), my doubt now is whether Can the original GFA file be restored through this comparison method, and can the topology of the graph be restored? Maybe I'm naive, but thanks anyway for the great software and prompt replies, and good luck with your work!

ekg commented 1 year ago

By the way, I'll be very happy to discuss this project idea. Please email if you'd like to set up a time.

8banzhuan commented 1 year ago

Thank you! I am a freshman who has just entered the university. I still have a lot of ignorance in this field. I will continue to study. If I have new ideas and progress, I will send you an email to discuss again. Thank you for your understanding of my naive ideas. Best Wishes!

ekg commented 1 year ago

The topologies of both of the original graphs will be maintained in the approach that I outlined.

They come from the decomposition of the original A and B graphs into sequence/alignment sets. There is no tool to implement this decomposition, and it's not clear what the ideal approach would be.

Note that you can make a decomposition that's much smaller than the full all pairs comparison implied by the graph, but it will still reproduce the graph through transitive match relationships.