PengNi / ccsmeth

Detecting DNA methylation from PacBio CCS reads
BSD 3-Clause Clear License
71 stars 10 forks source link

about vocub_size #12

Closed XiaoyuShi97 closed 2 years ago

XiaoyuShi97 commented 2 years ago

https://github.com/PengNi/ccsmeth/blob/dbc0cba01b5481eafa4de6285d0df15d17a0978b/ccsmeth/models.py#L31

Hi, nice project! But I am confused about this parameter. From my understanding, there are only four types of base, i.e. ATCG. Why you set the vocabulary size as 16? Thx!

PengNi commented 2 years ago

In most cases, 4 is enough. Using 16 considers the case that there are other IUPAC DNA/RNA bases in the sequence. See IUPAC Codes.