Closed davetang closed 5 years ago
Here's my R Markdown file for how I installed SNPmatch
and created the examples.
Hi @davetang ,
SNPmatch does all the calculations using numpy arrays in database and sample files and assumes the positions in VCF or BED as sorted arrays. You need to provide in sorted input file (and also for positions in database file).
The last two examples show sorted input but the result is not as expected. I am expecting 1000 out of 1000 matches for sample 1.
Thank you for making a detailed markdown file, makes life easier. I think the problem is duplicated entries in sample.vcf. You have multiple lines at SNP positions 5616, 6398, 10829 etc. Can you remove them and try running SNPmatch again?
Well spotted; that was the problem. Thank you!
I have installed SNPmatch using docker-miniconda and created several tests. If you want more information, I can provide all the steps I used to produce the examples below. For now, I will just show the relevant steps.
I created
eg3.bed
, which is completely identical to sample 1 insample.vcf
and the results are as expected.However, when I shuffle the input using
shuf
(this just randomises the lines) the results are not as expected. The following steps are identical apart from runningshuf
.If I print every second line (thus the input is still sorted) instead of shuffling, I also get unexpected results.
Even more perplexing is when I shuffle and re-sort the input, I get another result!
Do you know what's going on?