Closed PanZiwei closed 3 years ago
Hi Ziwei,
(1) The fastq sequence in Basecall_1D_00x is the sequence of the read called by a basecaller.
(2) After tombo re-squiggle
, the raw signals of the read are actually aligned to the genome reference. So the sequence from tombo event is a region of genome reference where the read is aligned to.
(3) I didn't find Basecall_1D_001
group in my example fast5 files. I can't explain which basecall result this is. I can check if you provide a fast5 file which contains Basecall_1D_001
group.
Best, Peng
Hi Ziwei,
(1) The fastq sequence in Basecall_1D_00x is the sequence of the read called by a basecaller. (2) After
tombo re-squiggle
, the raw signals of the read are actually aligned to the genome reference. So the sequence from tombo event is a region of genome reference where the read is aligned to. (3) I didn't findBasecall_1D_001
group in my example fast5 files. I can't explain which basecall result this is. I can check if you provide a fast5 file which containsBasecall_1D_001
group.Best, Peng
Hi Peng,
Thanks for the response. In your readme file you gave the tombo resquiggle usage example:
tombo resquiggle fast5s.al GCF_000146045.2_R64_genomic.fna --processes 10 --corrected-group RawGenomeCorrected_001 --basecall-group Basecall_1D_000 --overwrite
So to my understanding the re-squiggle utilizes the sequence information from Guppy (saved in Basecall_1D_000
group), so for the same read, I thought the sequence from Guppy and sequence in the event are able to map to the same region.
Or they shouldn't map to the same region of genome since re-squiggle correct the signal andmay influence the mapping?
Hi Ziwei,
The sequence from the tombo event table is actually a region from genome reference.
Resquiggle has two steps generally. 1. Use minimap2 to map the read to genome reference. 2. Map the raw signals to the region of genome where the read is aligned to.
Hi Ziwei,
The sequence from the tombo event table is actually a region from genome reference.
Resquiggle has two steps generally. 1. Use minimap2 to map the read to genome reference. 2. Map the raw signals to the region of genome where the read is aligned to.
Hi Peng, Thank you so much for the explanation! It definitely answered my question.
Thanks again for your help!
Hi, I had a question on the genome sequence you are using for feature extraction and would really appreciate it if you can provide more information.
I noticed that in line 221 of your extract_features.py you concatenated the base in the event after Tombo re-squiggle as the genome sequence, however, when I checked the nucleotide sequence from the /Analyses/Basecall_1D_000/BaseCalled_template/fastq group in the single-read fast5 file of your example fast5s.sample.tar.gz, I found the sequence in fastq is different from the base from Tombo event. So can you explain more about the relationship between the base in the event after Tombo re-squiggle and the fastq sequence basecalled by Albacore as you mentioned?
Also, there are two basecall information saved in your example fast5 files. You mentioned that
--Basecall_1D_000
group is the Albacore result, how about the--Basecall_1D_001
one?Thank you so much for your help!
Best, Ziwei