cpockrandt / genmap

GenMap - Fast and Exact Computation of Genome Mappability
Other
100 stars 18 forks source link

How to choose bases to mask #21

Closed NicMAlexandre closed 3 years ago

NicMAlexandre commented 3 years ago

I ran genmap on my genome of interest which output a bigwig, bedgraph, and text file. Because these files are just outputting the mappability at all sites, I assume I would need to make a decision based on the number of bases I can recover with a particular mappability value. Havent done this before, so if you have any info on how to choose the sites to mask/ how to mask said sites, it would be super helpful!

cpockrandt commented 3 years ago

Hi @nicolasalexandre21

you can transform the bedgraph file to a bed file (adding a 4th column "name", here I just added a dot "."), filter unmappable positions that you would like to mask and use bedtools to mask your genome.

Transform .bedgraph to .bed (here: only keep entries with a mappability of < 0.1):

awk -F$'\t' 'BEGIN { OFS="\t" } { if ($4 + 0.0 < 0.1) print $1, $2, $3, ".", $4 }' your_genome.genmap.bedgraph > your_genome.genmap.bed

Then you can use bedtools maskfasta to soft- or hard-mask your fasta file using the bed file.

https://bedtools.readthedocs.io/en/latest/content/tools/maskfasta.html

Note: I chose the mappability threshold of 0.1 arbitrarily. I think the value that you want to choose here depends on your application.

NicMAlexandre commented 3 years ago

Thank you, this is excellent! I’m not seeing much information online about choosing map ability values. I am performing a GWAS in resequenced genomes that are being aligned to the reference.

What value would you suggest/ have you seen others use?

On Thu, Jun 17, 2021 at 9:13 AM cpockrandt @.***> wrote:

Closed #21 https://github.com/cpockrandt/genmap/issues/21.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cpockrandt/genmap/issues/21#event-4904942457, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFB633YZ2IVQRP47BKJVO4TTTIGIBANCNFSM46ZXZX2A .

-- Best,

Nicolas Alexandre PhD Candidate, Integrative Biology Whiteman Lab University of California - Berkeley @. @.>

cpockrandt commented 3 years ago

I'm afraid there is no general answer and difficult to say without knowing what exactly you are doing and trying to achieve. Maybe looking at the distribution of mappability values and looking at a few examples/loci whether you want to them to be masked or not, and go from there.

NicMAlexandre commented 3 years ago

Thank you for the clarification, much appreciated.

On Fri, Jun 18, 2021 at 6:48 AM cpockrandt @.***> wrote:

I'm afraid there is no general answer and difficult to say without knowing what exactly you are doing and trying to achieve. Maybe looking at the distribution of mappability values and looking at a few examples/loci whether you want to them to be masked or not, and go from there.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cpockrandt/genmap/issues/21#issuecomment-864013608, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFB6337P5UPP5PBFO7VHDYTTTM6DJANCNFSM46ZXZX2A .

-- Best,

Nicolas Alexandre PhD Candidate, Integrative Biology Whiteman Lab University of California - Berkeley @. @.>