epifluidlab / FinaleMe

MIT License
9 stars 2 forks source link

Question about features #10

Closed liujilei156231 closed 3 months ago

liujilei156231 commented 4 months ago

Dear Prof Liu,

Here are the features generated for training using the first step script. I noticed that one feature, Offset_frag,was not mentioned in the original article. It may not have been used. So I have some questions: what does Offset_frag refer to, how does it differ from Dist_frag_end, and does it facilitate model training? Thanks for your reply.

Original artical: At each CpG in each fragment in the bam file (CpG point), we can obtain three features: the fragment’slength,theCpG’s distance to the center of that fragment, and the fragment coverage at that particular CpG position in the reference genome.

chr     start   end     readName        FragLen Frag_strand     methy_stat      Norm_Frag_cov   baseQ   Offset_frag     Dist_frag_end   methyPrior
chr22   10527860        10527862        SNL144:297:HYWH3BCXY:1:1111:15154:68116 154     -       m       1.122016        35      1       1       NaN
chr22   10527872        10527874        SNL144:297:HYWH3BCXY:1:1111:15154:68116 154     -       m       1.122016        40      13      13      NaN
chr22   10527878        10527880        SNL144:297:HYWH3BCXY:1:1111:15154:68116 154     -       m       1.122016        40      19      19      NaN
chr22   10527939        10527941        SNL144:297:HYWH3BCXY:1:1111:15154:68116 154     -       m       1.122016        39      80      74      NaN
chr22   10538977        10538979        SNL144:297:HYWH3BCXY:1:2115:3237:76894  166     -       m       1.122016        39      5       5       NaN
c
elicj01 commented 1 month ago

Hi, could I ask you what file did you use for the CpG coordinates, or where can it be found?

Thank you in advance!!