ANGSD / angsd

Program for analysing NGS data.
228 stars 50 forks source link

Selecting single SNP per contig #307

Open DiedeMaas90 opened 4 years ago

DiedeMaas90 commented 4 years ago

Hi,

I have a potentially simple question: is it possible to select for a single SNP per contig within ANGSD? For example the first one or a random one? I cannot seem to find the answer online.

Cheers, Diede

ANGSD commented 4 years ago

It is not something I have considered. May I ask in what situtation this could be useful?

DiedeMaas90 commented 4 years ago

I'm using RADseq data, and I was thinking that using multiple SNPs per RAD-tag would perhaps over-emphasize effects. I saw that in STACKS there is also an option of using only one SNP per tag. You also then avoid analysing SNPs that are probably linked, right?

OliverPStuart commented 4 years ago

Hi @DiedeMaas90

I'm also using ANGSD with RADseq and other data types with many small contigs. A fairly simple workaround is to run whatever analysis in a first pass, look at the output, thin down that output to one locus per contig, then use that as an input file to reanalyse.

For example, the -doMaf output looks something like this:

chromo position major minor ref anc knownEM unknownEM nInd contig00030 1127 A C A A 0.097012 0.034447 6 contig00030 1222 C T C C 0.488717 0.488718 5

You could extract the first two columns, then use whatever data wrangling language to select only one row per chromo or some other filtering strategy, then use ./angsd sites index sites.file to index that file, which should be in the format:

contig0030 1127 contig0035 403 contig0052 473

You can then pass this to the initial ANGSD call by including -sites sites.file which causes ANGSD to only consider the input sites.

See more detail on how ANGSD uses input filters here: http://www.popgen.dk/angsd/index.php/Sites