Closed edgardomortiz closed 4 years ago
Hi Edgardo, How are you determining sample depth for loci if not with the stats file? Are you looking at the .loci file? Or the vcf? Which file are you looking at? -isaac
On Wed, Feb 26, 2020 at 4:19 PM Edgardo M. Ortiz notifications@github.com wrote:
Hello, I performed a ddrad reference analysis with v.0.9.42 and setting min_samples_locus to 4 produces loci with at least 2 samples (setting it to 8 produces loci with at least 6 samples and so on, always -2). So, to get loci with at least 4 samples used min_samples_locus=6, is the _stats.txt file correct despite this behavior?
Edgardo
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dereneaton/ipyrad/issues/398?email_source=notifications&email_token=ABNSXP2ZN3GVMGFS5AW3SODRE2CBZA5CNFSM4K4H5DUKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IQPZUQA, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNSXP6JBDVU5NZBHQLURBTRE2CBZANCNFSM4K4H5DUA .
Hi Isaac, I examined the .vcf, .alleles, and .loci files. I didn't check other formats so far.
Sorry to chime in, but I've seen sites with <min_samples_locus make their way into the vcf files.
From what I gather, these lower coverage sites are derived from loci with the correct sample coverage, but are located in a messy bit of a sequence in a locus (e.g., towards the ends). I haven't looked into this thoroughly, but I just filter these sites out in vcftools.
@brpark29 yes I have observed that as well, I think this -2 difference is more systematic though, especially when looking at the .loci and .alleles files. It may be related to this bit of code:
https://github.com/dereneaton/ipyrad/blob/958b5d73e489a00bbb8f31cf576a0780223dcd1c/ipyrad/assemble/write_outputs.py#L644-L647
Perhaps the solution is:
self.minsamp += 1
??
@edgardomortiz That's exactly the problem. I fixed it 11e34c3, will push a new tag so bioconda package will be updated. Thanks for reporting AND figuring it out!
Hello, I performed a ddrad reference analysis with v.0.9.42 and setting
min_samples_locus=4
produces loci with at least 2 samples (setting it to 8 produces loci with at least 6 samples and so on, always -2). So, to get loci with at least 4 samples I usedmin_samples_locus=6
, is the_stats.txt
file correct despite this behavior?Edgardo