Albacore / albacore

Albacore is a professional quality suite of Rake tasks for building .NET or Mono based systems.
www.albacorebuild.net
221 stars 64 forks source link

How is the event stdv in the basecalled fast5 file calculated? [Edited: wrong albacore that I was looking for] #243

Closed EPKok closed 5 years ago

EPKok commented 5 years ago

Hi, I converted a fast5 file and extracted the events and time based on the recommendation from the post below: https://community.nanoporetech.com/posts/squiggle-plot-for-raw-data

The top 30 rows are as below:

No.     Time     Event
1       0       104.002
2       0.000332005     73.698
3       0.000664011     74.1248
4       0.000996016     74.8362
5       0.00132802      73.5557
6       0.00166003      71.9907
7       0.00199203      72.5598
8       0.00232404      72.4176
9       0.00265604      71.8485
10      0.00298805      71.1371
11      0.00332005      70.8525
12      0.00365206      73.8403
13      0.00398406      71.5639
14      0.00431607      73.8403
15      0.00464807      72.8444
16      0.00498008      72.8444
17      0.00531208      72.8444
18      0.00564409      72.7021
19      0.0059761       73.1289
20      0.0063081       71.9907
21      0.00664011      71.7062
22      0.00697211      71.8485
23      0.00730412      73.4135
24      0.00763612      71.7062
25      0.00796813      72.2753
26      0.00830013      71.4216
27      0.00863214      70.8525
28      0.00896414      72.9866
29      0.00929615      72.4176
30      0.00962815      72.9866

The mean for every 15 time points are 74.8742 and 72.3417

After basecalling using Albacore, I extracted the event information from the same basecalled fast5 file using Poretools by implementing poretools events , the first two line of the output is as below:

file    strand  mean    start   stdv    length  model_state     model_level     move    p_model_state   mp_model_state  p_mp_model_state        p_A     p_C     p_G     p_T
     raw_index
GISPC943_20180505_FAH50484_MN23396_sequencing_run_180505_gblock_GMod3_I_with_ACN_20294_read_100_ch_102_strand.fast5     template        74.87416        0       30.468916
       15      GTGCC           1       0.090935506
GISPC943_20180505_FAH50484_MN23396_sequencing_run_180505_gblock_GMod3_I_with_ACN_20294_read_100_ch_102_strand.fast5     template        72.34167        15      2.7357106
       15      TGCCT           1       0.08715538

Am I correct to say that each of kmer is assigned based on the mean and stdv of every 15 time points? The reason why I asked so is because I could get the mean for each of them correctly but I don't get the right value for stdv.

How is the stdv calculated in this case?

Thanks.