cschin / Peregrine

Peregrine: Fast Genome Assembler Using SHIMMER Index
Other
99 stars 9 forks source link

Question on SHIMMER and MC #17

Open kfletcher88 opened 4 years ago

kfletcher88 commented 4 years ago

Hi,

I am exploring using Peregrine with some Illumina corrected single molecule reads (>99% ID to Illumina reference). Sequenced to ~ 250x. I was wondering if and what the correlation between shimmer-r and mc was? Explicitly, does the the SHIMMER count increase as the reduction factor is increased? Or am I misinterpreting the documentation?

I am trying to assemble a heterozygous (~1%), highly repetitive (~70%), diploid genome and am obtaining an over-inflated (3 to 4 x size) highly fragmented output. At the moment I would be happy to obtain a consensus assembly. Any advice on parameters to tweak would be appreciated. Would increasing the reduction factor help remove redundancy?

Thanks Kyle

cschin commented 4 years ago

"mc" stands for "mmer count". The higher the count, the higher the likelihood the k-mer is from a repeat. The shimmer-r controls the reduce level. The smaller shimmer-r given more dense SHIMMER for index (-> lager index file, more sensitive for overlapping.)

For "unique" part of the genome, the mc should be more or less independent of shimmer-r. However, increasing SHIMMER density would increase mc. This is my current guess.