Closed yezhenqing closed 2 years ago
Sorry. need to correct something about my previous post.
To my understanding, the unitig with head-ATAA should have three colors, and the unitig with head-ACAA should have two colors, and the unitig with head-ACAC should have one color only...
I noticed the Bifrost 1.0.6 and pyfrost 0.1.0b1 might be the old version and out of date.
So I git clone the newest version and compiled it locally. Now it works well.
But I noticed a minor issue as below:
>>> for n, data in g.nodes(data=True):
... strand = str(data['strand'])
... if(strand == 'Strand.FORWARD'):
... print('=======================')
... for k,v in data.items():
... if(k=='colors'):
... print(k, set(v))
... else:
... print(k, v)
...
=======================
colors {0, 1, 2}
unitig_sequence ATAACA
tail AACA
length 3
strand Strand.FORWARD
mapped_sequence ATAACA
pos 0
is_full_mapping True
unitig_length 3
head ATAA
=======================
colors {1, 2}
unitig_sequence AATTGT <----here I expected it would be "ACAATTG"
tail TTGT
length 3
strand Strand.FORWARD
mapped_sequence AATTGT
pos 0
is_full_mapping True
unitig_length 3
head AATT
=======================
colors {2}
unitig_sequence ACACC
tail CACC
length 2
strand Strand.FORWARD
mapped_sequence ACACC
pos 0
is_full_mapping True
unitig_length 2
head ACAC
And another small issue I also would like to point out for reference. I generated the ccDBG graph in two steps:
Bifrost build -k 4 -c -r blue.fa -r red.fa -o B_R.step1
By loading B_R.step1 into python, and it output two unitags:
>>> for n, data in g1.nodes(data=True):
... strand = str(data['strand'])
... if(strand == 'Strand.FORWARD'):
... print('=======================')
... for k,v in data.items():
... if(k=='colors'):
... print(k, set(v))
... else:
... print(k, v)
...
=======================
colors {0, 1}
unitig_sequence ATAACAATT
tail AATT
length 6
strand Strand.FORWARD
mapped_sequence ATAACAATT
pos 0
is_full_mapping True
unitig_length 6
head ATAA
=======================
colors {0, 1}
unitig_sequence ATAACAATT
tail AATT
length 6
strand Strand.FORWARD
mapped_sequence ATAACAATT
pos 0
is_full_mapping True
unitig_length 6
head ATAA
But these two unitags actually are same, probably it can be merged.
And then updated the B_R.step1 further:
Bifrost update -k 4 -c -r green.fa -g B_R.step1.gfa -f B_R.step1.bfg_colors -o B_R_G.step2
"update" works fine. Very nice! Thanks for the cool tools!
Greetings, I have a question about colours on unitig.
Here, I would like to use the toy case from the link (https://kamimrcht.github.io/webpage/tigs.html) to illustrate the issues as below:
Firstly, I created three fasta files:
Then, I generated the GFA file and the color file via the command:
I didn't put other parameters like (-i -d) to avoid graph simplification.
By loading the GFA and color files into the python environment:
To my understanding, we have three colors here:
And we also can see there are six nodes in the graph g:
Just ignore those nodes from Strand.REVERSE now, we should have three unitigs. And I tried to get the colors for each unitig:
What I'm quite confused about is those colors assigned to these unitigs. For example, I would expect the unitig with head-ATAA should have two colors (green and red) only. And furthermore, we only have three colors in total, why the unitig with head-ACAC has four colors {2,3,4,5}.
Maybe I misunderstood something about colors. Can you kindly explain a little bit how these colors been calculated?
Bifrost 1.0.6 pyfrost 0.1.0b1
Thank you for you work on this and any input would be appreciated.