Closed lch14forever closed 5 years ago
Hi,
The number after 2D is the index of the template in the original read. For PBDAGCON, it needs the reads aligned to a template to call consensus, so we choose one segment in the tandem repeats as the template.
The template is chosen according to the following criteria:
Hi,
Thank you for your quick response. In this case how can I find quantity of segments used for each corrected read? pipeline.log file does not contain this information while logs/myeasylog.log file has got description of single read only.
Unfortunately, that info was not kept. I believe you can modify the following block in buildConsensus.py (from line 265) to print that information:
# blast alignment
for subRead in subReads:
q_handle = open(tmpQ, 'w')
SeqIO.write(subRead, q_handle, "fasta")
q_handle.close()
stdout = blastn(tmpQ, tmpRef, None, blastOutFMT, seg_num, seg_cov, True)
num = stdout.count('\n') - 1
errors = get_errors(stdout)
if num > alignments["num"]:
# prefer more alignments
alignments["num"] = num
alignments["error"] = errors
alignments["alignments"] = stdout
elif num == alignments["num"]:
# for the same number of alignments, prefer the one with lower error rates
if errors < alignments["errors"]:
alignments["num"] = num
alignments["error"] = errors
alignments["alignments"] = stdout
Sorry, I think I understood your question wrongly. If you are looking for the number of segments used to construct consensus, you have to look for lines in the messages like follows:
Consensus called 2d46a7a6-0bb5-4994-b1bb-bc8494695cbb_Basecall_2D_2d Number of segments 7
and you can grep for the pattern 'Consensus called' with some commands as below:
paste <(grep 'Consensus called' YourLogFile) <(grep '^>' YourConsensus.fa )
Reported by @Szymonome,
Found that BLASTN switch generates some information at the end of the read name e.g. Basecall_2D_7/0_730 while POA does not. Why is it like that? Moreover, what exactly does the number "7" stands for? Initially I thought it may be number or molecules used for sequence correction however, fasta file from GRAPHMAP switch have reads with numbers 2, 3, 4 or 5 that shouldn't happen as minimum amount should be 6, f I'm right.