Closed NicMAlexandre closed 3 years ago
Hi @nicolasalexandre21
you can transform the bedgraph file to a bed file (adding a 4th column "name", here I just added a dot "."), filter unmappable positions that you would like to mask and use bedtools to mask your genome.
Transform .bedgraph to .bed (here: only keep entries with a mappability of < 0.1):
awk -F$'\t' 'BEGIN { OFS="\t" } { if ($4 + 0.0 < 0.1) print $1, $2, $3, ".", $4 }' your_genome.genmap.bedgraph > your_genome.genmap.bed
Then you can use bedtools maskfasta
to soft- or hard-mask your fasta file using the bed file.
https://bedtools.readthedocs.io/en/latest/content/tools/maskfasta.html
Note: I chose the mappability threshold of 0.1 arbitrarily. I think the value that you want to choose here depends on your application.
Thank you, this is excellent! I’m not seeing much information online about choosing map ability values. I am performing a GWAS in resequenced genomes that are being aligned to the reference.
What value would you suggest/ have you seen others use?
On Thu, Jun 17, 2021 at 9:13 AM cpockrandt @.***> wrote:
Closed #21 https://github.com/cpockrandt/genmap/issues/21.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cpockrandt/genmap/issues/21#event-4904942457, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFB633YZ2IVQRP47BKJVO4TTTIGIBANCNFSM46ZXZX2A .
-- Best,
Nicolas Alexandre PhD Candidate, Integrative Biology Whiteman Lab University of California - Berkeley @. @.>
I'm afraid there is no general answer and difficult to say without knowing what exactly you are doing and trying to achieve. Maybe looking at the distribution of mappability values and looking at a few examples/loci whether you want to them to be masked or not, and go from there.
Thank you for the clarification, much appreciated.
On Fri, Jun 18, 2021 at 6:48 AM cpockrandt @.***> wrote:
I'm afraid there is no general answer and difficult to say without knowing what exactly you are doing and trying to achieve. Maybe looking at the distribution of mappability values and looking at a few examples/loci whether you want to them to be masked or not, and go from there.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cpockrandt/genmap/issues/21#issuecomment-864013608, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFB6337P5UPP5PBFO7VHDYTTTM6DJANCNFSM46ZXZX2A .
-- Best,
Nicolas Alexandre PhD Candidate, Integrative Biology Whiteman Lab University of California - Berkeley @. @.>
I ran genmap on my genome of interest which output a bigwig, bedgraph, and text file. Because these files are just outputting the mappability at all sites, I assume I would need to make a decision based on the number of bases I can recover with a particular mappability value. Havent done this before, so if you have any info on how to choose the sites to mask/ how to mask said sites, it would be super helpful!