DevonDeRaad / SNPfiltR

Other
8 stars 1 forks source link

Support for phased genotypes? #4

Open davidecarlson opened 1 year ago

davidecarlson commented 1 year ago

Hi Devon,

Thanks for this really helpful tool!

I noticed that the "assess_missing_data" functions will throw an error if the VCF file has phased genotypes (i.e., the delimiter is a "|" instead of a "/").

If I create a pull request with changes to allow genotypes where the delimiter is a "|", would you be open to reviewing it?

Thanks! Dave

DevonDeRaad commented 1 year ago

Hi Dave, yes absolutely! It's not something I have time to get to right away, but if you would be willing to create the pull request I would happily review it, and either way, it's something I'd like to address long term.

Thanks! Devon

ShaolinXU commented 7 months ago

Here is a rcpp function to replace "|" with "/" in the gt slot from the vcfR object:

#include <Rcpp.h>
#include <boost/algorithm/string/replace.hpp> // Add include statement for boost library
using namespace Rcpp;
using namespace boost::algorithm;

// [[Rcpp::export]]
CharacterMatrix replacePipeWithSlash(CharacterMatrix df) {
    int nrow = df.nrow();
    int ncol = df.ncol();
    CharacterMatrix new_df(nrow, ncol);

    // Copy column names
    colnames(new_df) = colnames(df);

    for (int j = 0; j < ncol; ++j) {
        for (int i = 0; i < nrow; ++i) {
            if (df(i, j) != NA_STRING) {
                std::string s = Rcpp::as<std::string>(df(i, j));
                replace_all(s, "|", "/"); // Replace all "|" with "/"
                new_df(i, j) = s;
            } else {
                new_df(i, j) = df(i, j);
            }
        }
    }

    return new_df;
}

just load this function in R and run this: vcfR@gt <- replacePipeWithSlash(vcfR@gt). Then the assess_missing_data should be able to run without error