Closed davmeleuterio closed 4 years ago
My gut feeling is that there might be an overflow for the last number. But I can not find it in the code right now, would you mind sharing that sequence here?
Thanks!
Sure, here is the sequence:
0b034307-8d13-47d7-8ee1-c21310a38963_runid=1 GAACTCTCTCTCTCTCTCTCGTCTCTCTCTCTCTCTCTCTCTCTCTCTACTCTCTCTCTC TCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTTTCTCACTCTTTCTCGCTCTCTCAAAA CTCGCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTTTCTCTCTCTCTCTTTCTCTCTCTCT CTTTCTCTCTCTTCTCTGCTCTTTCTCTCTTTCTCTCTTAACTCTCTCTCTCTCTCGCTC TCTCTCTCTCTCTCTCTCTCTTT
I also noticed, now looking at the sequence length (267), that it doesn't match the output readLen (263). Is there any reason why this is happening?
The length of the sequence you paste here is 263, not 267. Did you count the newlines?
I run TideHunter-v1.2.1 with default parameters, the output is:
$ TideHunter test.fa
>1_cons0_263_4_103_30_3.3_0_12,43,73
CTCTCTCTCTCTCTCTCTCTCTCTCTCTCT
>1_cons1_263_101_230_30_4.0_0_134,164,196
TCTCTCTCTCTCTCTCTCTCTTTCTCTCTC
Could you also paste your version and the running command here?
Sorry for a late response, I checked and the counter I was using was counting newlines, so it was giving me a different length size, sorry about that. I ran TideHunter with the next parameters: Tidehunter -f 2 -t 3 test.fa > test.out
The output appears as this: 0b034307-8d13-47d7-8ee1-c21310a38963_runid=1 cons0 263 4 103 30 3.3 0 12,43,74,32575 CTCTCTCTCTCTCTCTCTCTCTCTCTCTCT 0b034307-8d13-47d7-8ee1-c21310a38963_runid=1 cons1 263 101 230 30 4.0 0 134,164,197,33 TCTCTCTCTCTCTCTCTCTCTTTCTCTCTC
That 32575 coordinate keeps appearing. Besides, each time I run the code, it changes to a similar value, such as 32731, 32723,...
When I ran the code as fasta output, it seems to appear just like your output, so may be something related to the tabular output?
Thank you for your attention.
Hi Daniel,
Thanks! This is a bug in the tabular output. Sorry about the inconvenience. It is fixed in the latest release: v1.2.2 Please try it again.
Yan
Thank you, that solved the problem.
Hello, There is a column explanation from the tabular format that I can't quite understand, which is the subPos. It says: "Start coordinate of each tandem repeat unit sequence, followed by one end coordinate of the last tandem repeat unit sequence, separated by ",", all coordinates are 1-based." I don't understand what I've put in bold. I'll also put this example from my data, which has also gotten me confused, because of that number in bold:
readName: 0b034307-8d13-47d7-8ee1-c21310a38963_runid=1 consN: cons1 readLen: 263 start: 101 end: 230 consLen: 30 copyNum: 4.0 fullLen: 0 subPos: 134,164,197,32622 consensus: TCTCTCTCTCTCTCTCTCTCTTTCTCTCTC
Thank you for your attention, Daniel