baoxingsong / AnchorWave

Sensitive alignment of genomes with high sequence diversity, extensive structural polymorphism and whole-genome duplication variation
MIT License
145 stars 19 forks source link

Converting maf to sam #31

Closed Yutang-ETH closed 2 years ago

Yutang-ETH commented 2 years ago

Hi Baoxing,

First of all, congratulations on your new position at Peking University.

My question is regarding this line: python2 maf-convert sam anchorwave.maf | sed 's/[0-9]+H//g' > anchorwave.sam

Why do you substitute any H in sam? Does this mean you remove any H (hard clip) from the cigar string in sam? Could you please shed some light on this? Thank you very much.

Best wishes, Yutang

baoxingsong commented 2 years ago

The number before H could be very large. It is OK to have them in the sam file. But it would be problematic to convert those sam files into bam or cram format. There are some limitations with those large numbers in binary formats.

Yutang-ETH commented 2 years ago

Thank you very much Baoxing for your explanation. Yesterday I tried to convert sam to bam without removing the H and it didn't work as you said.

By the way, I think converting maf to sam then to paf is not a good idea, because the coordinate of query (the global chain of query coordinates) was lost after converting sam to paf, I think this is because sam doesn't store the global chain coordinate for the query. However, I found a solution to this by swapping query and reference, then converting swap_maf to sam then to paf, the reference coordinates in the swap paf are the query coordinates in the non-swap paf.

Best wishes, Yutang

slbai01 commented 1 year ago

Hi Yutang,

Are you test paftools.js to convert Sam to paf? Does it have correct result?

Shenglong

Yutang-ETH commented 1 year ago

Hi Shenglong,

Thank you very much for asking. Yes, I used paftools.js to convert sam to paf, it returns some warnings, however, the resulting paf seems correct. I actually don't understand the warning message, but paftools.js finished anyway.

Best wishes, Yutang

slbai01 commented 1 year ago

I think I have the similar message, the warning looks like about large “H”. What do you mean the swap mentioned above? If I use paftools, do I still need to consider this?

Yutang-ETH commented 1 year ago

I also guess the warning message is related to "H".

Let's say you align A to B, A is ref and B is query, when you convert maf to sam, then only the coordinate of the ref is retained in sam, the coordinate of B is lost. However, in my case, I also need the coordinate of B, so what I did is swap ref and query in the maf using the python script provided by anchorwave, now B is ref and A is query in the swapped maf, then I converted this swapped maf to sam, the coordinate in sam is B's. I hope this is clear to you.

If only the ref coordinate is needed for you, you don't need to do what I did.

Best wishes, Yutang

slbai01 commented 1 year ago

Clear explanation! Thanks for your kind answer.

Yutang-ETH commented 1 year ago

No problem, have fun.

Best wishes, Yutang