DEploid-dev / DEploid

dEploid is designed for deconvoluting mixed genomes with unknown proportions. Traditional ‘phasing’ programs are limited to diploid organisms. Our method modifies Li and Stephen’s algorithm with Markov chain Monte Carlo (MCMC) approaches, and builds a generic framework that allows haloptype searches in a multiple infection setting.
http://deploid.readthedocs.io/en/latest/index.html
GNU General Public License v3.0
20 stars 10 forks source link

extractCoverageFromVcf fixed version #155

Closed shajoezhu closed 3 years ago

shajoezhu commented 8 years ago
extractCoverageFromVcf<-function (vcfName, ADFieldIndex = 2) 
{
    h <- function(w) {
        if (any(grepl("gzfile connection", w))) 
            invokeRestart("muffleWarning")
    }
    gzf = gzfile(vcfFile, open = "rb")
    skipNum = 0
    line = withCallingHandlers(readLines(gzf, n = 1), warning = h)
    while (length(line) > 0) {
        if (grepl("##", line)) {
            skipNum = skipNum + 1
        }
        else {
            break
        }
        line = withCallingHandlers(readLines(gzf, n = 1), warning = h)
    }
    close(gzf)
    vcf = read.table(gzfile(vcfName), skip = skipNum, header = T, 
        comment.char = "", stringsAsFactors = FALSE, check.names = FALSE)
    sampleName = names(vcf)[10]
    tmp = vcf[[sampleName]]
    field = strsplit(as.character(tmp), ":")
    tmpCovStr = unlist(lapply(field, `[[`, ADFieldIndex))
    tmpCov = strsplit(as.character(tmpCovStr), ",")
    indexOfCheckingCov.logic = (lapply(tmpCov,'length') != 2)
    chrom = vcf[,1]
    pos = vcf[,2]
    if ( sum(indexOfCheckingCov.logic) > 0 ){
        cat("Drop markers at", which(indexOfCheckingCov.logic),"\n")
        tmpCov = tmpCov[!indexOfCheckingCov.logic]
        chrom = chrom[!indexOfCheckingCov.logic]
        pos = pos[!indexOfCheckingCov.logic]
    }
    refCount = as.numeric(unlist(lapply(tmpCov, `[[`, 1)))
    altCount = as.numeric(unlist(lapply(tmpCov, `[[`, 2)))
    return(data.frame(CHROM = chrom, POS = pos, refCount = refCount, 
        altCount = altCount))
}
shajoezhu commented 8 years ago

This is a hack to filter out the site that vcf sample field doesn't have the second field AD