Closed martabe closed 4 years ago
Hi,
I have the same error with angsd version: 0.921-8-gc12d2fa (htslib: 1.7-23-g9f9631d). With the newest 0.930 version I get an upstream error because my vcf cannot be read and I get the error reported here: #264 and #227 @martabe which version are you using?
my realsfs check
tells me:
-> problems with unsorted saf file chromoname: '10' pos[2]:125233 vs posd[2-1]:249224030
and first 2 columns in my realSFS print
tells me:
1 249209140
10 249220687
10 249224031
10 125234
10 126070
which corresponds to my vcf lines:
1 249209140
1 249220687
1 249224031
10 125234
10 126070
I'd also appreciate any suggestions regarding this issue.
Hi,
I'm running version 0.921 (htslib: 1.6). I didn't try with a newer version so far, this one was the one provided by the cluster admins. Unfortunately I will not have time to try in the next few weeks, but I'll keep the post updated if I try.
Cheers, Marta
There were big changes to the vcf reader in a commit in Feb 2019, I would not advise using versions <0.929 for working with vcf.
Some further issues with the INDEL flag or with the PL field can be worked-around for now by making appropriate changes to the header, as in my comment to #227
Just in case someone else found a solution to this, I'm still getting this error with version 0.931
this might be very silly but when I get an error of this kind, before thinking of anything else I would check if my line endings are actually unix type (not mac or windows). I stepped on this specific rake more often than I care to admit.
Misha
On Jan 2, 2020, at 9:11 PM, Debora notifications@github.com wrote:
Just in case someone else found a solution to this, I'm still getting this error with version 0.931
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ANGSD/angsd/issues/258?email_source=notifications&email_token=ABZUHGFLMX6Q5XRWSZOS3RLQ32UH5A5CNFSM4I6GBIA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIAE6BQ#issuecomment-570445574, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZUHGAN5K7NSDO4MAVQ53DQ32UH5ANCNFSM4I6GBIAQ.
Dear all,
Sorry for the very late response and thanks to everyone making efforts to solve the issue.
Now -doSaf only outputs one chromosome...
I tested a 1 sample, 7 chromosomes, 1000 sites per chromosome subset of my vcf, and the result is the same, only the first chromosome is included in the output.
These are the commands I used:
angsd -vcf-gl mytest.vcf -doSaf 1 -out mytestSaf -P 1 -nInd 1 -doMajorMinor 1 -anc Peex304.fasta -fold 1
realSFS check mytestSaf.saf.idx
realSFS print mytestSaf.saf.idx > mytestSaf.saf.idx.txt
I tested -doSaf
with and without the -fai
option, given the last comment to #268, but I see no difference in the output.
The realSFS check
output does not mention any error, and in the print of the saf.idx I only have chromosome 1.
I can provide the files if needed (I'm not doing it now because I would have to subset the genome).
Any suggestion? Thanks, Marta
Hi, all
I have the same error with angsd version: 0.930.
I cannot get 2d SFS.
the realSFS check results shows:
problems with unsorted saf file chromoname: 'Qrob_H2.3_Sc0000124' pos[53307]:3086 vs posd[53307-1]:1639536
Any suggestions?
Best regards,
Sandy
This should be resolved in commit cea07b3, so I am closing this issue. But feel free to reopen og submit a new issue if needed.
Best
Dear all,
I've been using version 0.933 but I still have that problem. I'm using some fake files with only 4 sites, two per chromosome. The problem is the same.
Input file is a vcf (added INDEL line, passed through dos2unix, just in case) with only 4 SNPs: PeexChr1 224249438 PeexChr1 224249748 PeexChr2 149739 PeexChr2 149995
angsd0.933/angsd -vcf-PL test_pop1_indelinfo.vcf -doSaf 1 -fai Peex304test.fasta.fai -out test_pop1 -P 2 -nInd 18 -doMajorMinor 5 -anc Peex304test.fasta
angsd0.933/misc/realSFS print test_pop1.saf.idx > test_pop1.saf.idx.table
cat
of test_pop1.saf.idx.table shows 4 sites all on chr2:
PeexChr2 224249438
PeexChr2 224249748
PeexChr2 149739
PeexChr2 149995
Did I miss something? Does anyone still experience this problem?
Thanks a lot, Marta
PS: can send test files if useful.
Sure, the testfile would be useful. Including the header.
Best
On 4 Mar 2021, at 12.53, Marta Binaghi notifications@github.com wrote:
Dear all,
I've been using version 0.933 but I still have that problem. I'm using some fake files with only 4 sites, two per chromosome. The problem is the same.
Input file is a vcf (added INDEL line, passed through dos2unix, just in case) with only 4 SNPs: PeexChr1 224249438 PeexChr1 224249748 PeexChr2 149739 PeexChr2 149995
angsd0.933/angsd -vcf-PL test_pop1_indelinfo.vcf -doSaf 1 -fai Peex304.fasta.fai -out test_pop1 -P 2 -nInd 18 -doMajorMinor 5 -anc Peex304.fasta angsd0.933/misc/realSFS print test_pop1.saf.idx > test_pop1.saf.idx.table cat of test_pop1.saf.idx.table shows 4 sites all on chr2: PeexChr2 224249438 PeexChr2 224249748 PeexChr2 149739 PeexChr2 149995
Did I miss something? Does anyone still experience this problem?
Thanks a lot, Marta
PS: can send test files if useful.
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/ANGSD/angsd/issues/258#issuecomment-790559053, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQOR3UBEVRFMGOPUBVQHLLTB5YCFANCNFSM4I6GBIAQ.
Test files:
vcf (remove .txt)
test_pop1_indelinfo.vcf.txt
fai (remove .txt)
Peex304test.fasta.fai.txt
Genome is too big even if I remove the useless chromosomes. I can upload it somewhere else if you need. It's here (but for future readers, I will remove it in a week from now): https://drive.google.com/file/d/1Ic3E1fJOCroF61ZN66BCGoXXZVUo0Z0J/view?usp=sharing
Thanks! Marta
Hi Marta, I cant reproduce the error but i had to modifify the files so i could run it. These are shown here:
mbp:angsd user$ cat *.vcf
PeexChr1 1 . G A 918.05 PASS AC=0;AF=0.051;AN=36;BaseQRankSum=0;DP=373;ExcessHet=0.0251;FS=13.752;InbreedingCoeff=0.2098;MLEAC=7;MLEAF=0.051;MQ=60;MQRankSum=0;QD=30.6;ReadPosRankSum=0.967;SOR=0.762 GT:AD:DP:GQ:PL 0/0:7,0:7:21:0,21,274 0/0:3,0:3:9:0,9,113 0/0:9,0:9:27:0,27,375 0/0:7,0:7:21:0,21,258 0/0:2,0:2:6:0,6,88 0/0:4,0:4:12:0,12,150 0/0:7,0:7:21:0,21,266 0/0:6,0:6:18:0,18,224 0/0:5,0:5:12:0,12,180 0/0:7,0:7:21:0,21,266 0/0:4,0:4:12:0,12,141 0/0:2,0:2:6:0,6,62 0/0:2,0:2:6:0,6,90 0/0:2,0:2:6:0,6,83 0/0:7,0:7:21:0,21,254 0/0:3,0:3:9:0,9,128 0/0:8,0:8:24:0,24,333 0/0:7,0:7:21:0,21,280 PeexChr1 2 . T C 1166.66 PASS AC=2;AF=0.098;AN=32;BaseQRankSum=0;DP=225;ExcessHet=0;FS=0;InbreedingCoeff=0.3632;MLEAC=15;MLEAF=0.114;MQ=60;MQRankSum=0;QD=27.86;ReadPosRankSum=0.674;SOR=2.774 GT:AD:DP:GQ:PL 0/0:3,0:3:6:0,6,90 0/0:3,0:3:6:0,6,90 0/0:3,0:3:9:0,9,128 0/0:3,0:3:9:0,9,108 ./.:0,0:0:.:0,0,0 0/0:1,0:1:3:0,3,12 ./.:0,0:0:.:0,0,0 0/0:3,0:3:9:0,9,113 0/0:1,0:1:3:0,3,42 0/0:5,0:5:15:0,15,209 0/0:6,0:6:18:0,18,210/0:1,0:1:3:0,3,27 0/0:1,0:1:3:0,3,16 0/0:1,0:1:3:0,3,27 0/0:1,0:1:3:0,3,15 0/0:1,0:1:3:0,3,27 0/0:5,0:5:15:0,15,183 1/1:0,6:6:18:217,18,0 PeexChr2 1 . C T 14065.9 PASS AC=14;AF=0.413;AN=34;BaseQRankSum=0;DP=764;ExcessHet=0.006;FS=1.442;InbreedingCoeff=0.3319;MLEAC=63;MLEAF=0.5;MQ=52.55;MQRankSum=-0.674;QD=28.67;ReadPosRankSum=-0.353;SOR=0.583 GT:AD:DP:GQ:PL 0/0:8,0:8:0:0,0,78 0/0:15,0:15:0:0,0,364 0/0:2,0:2:6:0,6,83 0/1:6,14:20:99:570,0,210 0/0:6,0:6:0:0,0,96 1/1:0,4:4:12:180,12,0 0/1:8,8:16:99:312,0,1994 0/1:5,10:15:99:447,0,177 1/1:0,6:6:18:270,18,0/1:2,10:12:99:414,0,124 1/1:0,11:11:33:495,33,0 ./.:13,0:13:.:0,0,0 0/0:10,0:10:27:0,27,405 0/1:4,2:6:72:72,0,162 0/1:11,19:30:99:765,0,405 0/0:3,0:3:9:0,9,118 0/0:15,0:15:45:0,45,651 1/1:0,23:23:69:1035,69,0 PeexChr2 2 . C A 3490.9 PASS AC=4;AF=0.102;AN=34;BaseQRankSum=-0.842;DP=1016;ExcessHet=0;FS=1.357;InbreedingCoeff=0.8048;MLEAC=14;MLEAF=0.109;MQ=57.25;MQRankSum=-1.386;QD=28.52;ReadPosRankSum=1.04;SOR=0.479 GT:AD:DP:GQ:PL ./.:15,0:15:.:0,0,0 0/0:14,0:14:0:0,0,489 0/0:12,0:12:33:0,33,495 0/0:18,0:18:54:0,54,793 0/0:9,0:9:27:0,27,371 0/0:6,0:6:18:0,18,217 0/0:25,0:25:66:0,66,990 0/0:19,0:19:45:0,45,675 0/0:13,0:13:27:0,27,405 0/0:19,0:19:54:0,54,810 0/0:15,0:15:42:0,42,630 0/0:20,0:20:51:0,51,765 0/0:8,0:8:24:0,24,333 0/0:9,0:9:24:0,24,360 0/0:26,0:26:72:0,72,1080 1/1:0,6:6:18:270,18,0 1/1:0,20:20:60:893,60,0 0/0:25,0:25:60:0,60,900 mbp:angsd user$ cat anc.fa
PeexChr1 GT PeexChr2 CCmbp:angsd user$
mbp:angsd user$ ./angsd -vcf-pl test_pop1_indelinfo.vcf -domaf 1 -dosaf 1 -anc anc.fa
mbp:angsd user$ ./misc/realSFS print angsdput.saf.idx |cut -f1-2 -> Version of fname:angsdput.saf.idx is:2 -> Assuming .saf.gz file: angsdput.saf.gz -> Assuming .saf.pos.gz: angsdput.saf.pos.gz -> args: tole:0.000000 nthreads:4 maxiter:100 nsites:0 start:(null) chr:(null) start:-1 stop:-1 fstout:(null) oldout:0 seed:-1 bootstrap:0 resample_chr:0 whichFst:0 fold:0 ref:(null) anc:(null) -> Will jump to multisaf printer and will only print intersecting sites between populations -> dim(angsdput.saf.idx):37 -> Dimension of parameter space: 37 -> Done reading data from chromosome will prepare next chromosome -> Is in multi sfs, will now read data from chr:PeexChr1 -> hello Im the master merge part of realSFS. and I'll now do a tripple bypass to find intersect -> 1) Will set iter according to chooseChr and start and stop, and possibly using -sites -> Only read nSites: 0 will therefore prepare next chromosome (or exit) -> Done reading data from chromosome will prepare next chromosome -> Is in multi sfs, will now read data from chr:PeexChr2 -> hello Im the master merge part of realSFS. and I'll now do a tripple bypass to find intersect -> 1) Will set iter according to chooseChr and start and stop, and possibly using -sites -> Only read nSites: 0 will therefore prepare next chromosome (or exit) -> Done reading data from chromosome will prepare next chromosome -> Run completed PeexChr1 1 PeexChr1 2 PeexChr2 1 PeexChr2 2 mbp:angsd fvr124$
On 4 Mar 2021, at 13.54, Marta Binaghi notifications@github.com wrote:
Test files: vcf (remove .txt) test_pop1_indelinfo.vcf.txt https://github.com/ANGSD/angsd/files/6083723/test_pop1_indelinfo.vcf.txt fai (remove .txt) Peex304test.fasta.fai.txt https://github.com/ANGSD/angsd/files/6083727/Peex304test.fasta.fai.txt Genome is too big even if I remove the useless chromosomes. I can upload it somewhere else if you need.
Thanks! Marta
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/ANGSD/angsd/issues/258#issuecomment-790596418, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQOR3XG576VIMNY7PZG35TTB57GVANCNFSM4I6GBIAQ.
Hi,
Thanks for testing that. I think the reason why it might not have worked out of the box for you could be the timestamp on the genome index (I dowloaded the test files myself and had to touch the index for it to work).
Other than that, for some reason that I still haven't figured out, when I run the original files in their original folder I get the error, but if I move the file out to a new folder and run it again everything goes fine. So I don't know what is going on but it seems like I can run the software now.
Thanks a lot for taking the time to check, I really appreciated that.
Have a nice day, Marta
Hi, me again.
I understood what was causing the mix up.
Option -P 2
causes mixed chromosome positions, option -P 1
does not. I work on a cluster and I request the CPUs according to the option in the command, but it seems like multithreading results in this behaviour with -doSaf.
That also explains why I couldn't reproduce the error on the test files. I was requesting only one CPU given the files were small.
Marta
That's a highly non-trivial problem! Thanks a lot, Marta, for catching this
On Thu, Mar 11, 2021 at 9:17 AM Marta Binaghi @.***> wrote:
Hi, me again.
I understood what was causing the mix up. Option -P 2 causes mixed chromosome positions, option -P 1 does not. I work on a cluster and I request the CPUs according to the option in the command, but it seems like multithreading results in this behaviour with -doSaf. That also explains why I couldn't reproduce the error on the test files. I was requesting only one CPU given the files were small.
Marta
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ANGSD/angsd/issues/258#issuecomment-796811142, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZUHGFTPCOHXDQIPAVBFKDTDDNHXANCNFSM4I6GBIAQ .
Hi all, I am still getting this problem. In my case it doesn't have anything to do with VCF files but the outcome is the same: -dosaf produces a SAF file with unsorting problems and I think it has to do with failing to detect the end of chromosomes. I am using a highly fragmented scaffold-assembled genome for the species I am getting the error with. (I have not gotten the error with other species with higher-quality genomes).
I am not using multithreading, so the issue martabe found is not the explanation in my case.
Here is my code for the safs:
$ANGSDIR/angsd -bam $POPLIST -ref $ANC -anc $ANC -out $OUT \
-dosaf 1 -GL 1 -uniqueOnly 1 -remove_bads 1 -only_proper_pairs 1 -minMapQ 30 -minQ 30 -C 50 -rf $REGFILE
This gives me a message saying
!! You are doing -dosaf incombination with -rf, please make sure that your -rf file is sorted !!
The file is sorted, and I verified this by sorting it as suggested here.
The job finishes without errors. Then, I do this:
$ANGSDIR/misc/realSFS check $OUT.saf.idx
And I get this message:
-> problems with unsorted saf file chromoname: 'VZRF01000187.1' pos[1822035]:22 vs posd[1822035-1]:2229692
-> problems with unsorted saf file chromoname: 'VZRF01000187.1' pos[3644070]:22 vs posd[3644070-1]:2229692
-> problems with unsorted saf file chromoname: 'VZRF01000586.1' pos[1522223]:112 vs posd[1522223-1]:1920221
-> problems with unsorted saf file chromoname: 'VZRF01000586.1' pos[3044446]:112 vs posd[3044446-1]:1920221
-> problems with unsorted saf file chromoname: 'VZRF01000927.1' pos[1777515]:360 vs posd[1777515-1]:2036024
-> problems with unsorted saf file chromoname: 'VZRF01000927.1' pos[3555030]:360 vs posd[3555030-1]:2036024
-> problems with unsorted saf file chromoname: 'VZRF01001110.1' pos[3144565]:18 vs posd[3144565-1]:3829545
-> problems with unsorted saf file chromoname: 'VZRF01001110.1' pos[6289130]:18 vs posd[6289130-1]:3829545
-> problems with unsorted saf file chromoname: 'VZRF01001586.1' pos[2177963]:4 vs posd[2177963-1]:2702387
-> problems with unsorted saf file chromoname: 'VZRF01001586.1' pos[4355926]:4 vs posd[4355926-1]:2702387
-> problems with unsorted saf file chromoname: 'VZRF01001720.1' pos[1967965]:140 vs posd[1967965-1]:2792947
-> problems with unsorted saf file chromoname: 'VZRF01001720.1' pos[3935930]:140 vs posd[3935930-1]:2792947
-> problems with unsorted saf file chromoname: 'VZRF01001859.1' pos[1540365]:68 vs posd[1540365-1]:2234032
-> problems with unsorted saf file chromoname: 'VZRF01001859.1' pos[3080730]:68 vs posd[3080730-1]:2234032
-> problems with unsorted saf file chromoname: 'VZRF01001867.1' pos[1499664]:311 vs posd[1499664-1]:1819316
-> problems with unsorted saf file chromoname: 'VZRF01001867.1' pos[2999328]:311 vs posd[2999328-1]:1819316
-> problems with unsorted saf file chromoname: 'VZRF01002078.1' pos[1168751]:10 vs posd[1168751-1]:1627490
-> problems with unsorted saf file chromoname: 'VZRF01002078.1' pos[2337502]:10 vs posd[2337502-1]:1627490
-> problems with unsorted saf file chromoname: 'VZRF01002081.1' pos[1255939]:121 vs posd[1255939-1]:1731002
-> problems with unsorted saf file chromoname: 'VZRF01002081.1' pos[2511878]:121 vs posd[2511878-1]:1731002
-> problems with unsorted saf file chromoname: 'VZRF01002083.1' pos[1448834]:45 vs posd[1448834-1]:1772714
-> problems with unsorted saf file chromoname: 'VZRF01002083.1' pos[2897668]:45 vs posd[2897668-1]:1772714
-> problems with unsorted saf file chromoname: 'VZRF01002164.1' pos[3001152]:70 vs posd[3001152-1]:3986750
-> problems with unsorted saf file chromoname: 'VZRF01002164.1' pos[6002304]:70 vs posd[6002304-1]:3986750
-> problems with unsorted saf file chromoname: 'VZRF01002170.1' pos[1337913]:14 vs posd[1337913-1]:1635394
-> problems with unsorted saf file chromoname: 'VZRF01002170.1' pos[2675826]:14 vs posd[2675826-1]:1635394
-> problems with unsorted saf file chromoname: 'VZRF01002270.1' pos[1572278]:3 vs posd[1572278-1]:1770276
-> problems with unsorted saf file chromoname: 'VZRF01002270.1' pos[3144556]:3 vs posd[3144556-1]:1770276
-> problems with unsorted saf file chromoname: 'VZRF01003085.1' pos[1873597]:935 vs posd[1873597-1]:2342476
-> problems with unsorted saf file chromoname: 'VZRF01003085.1' pos[3747194]:935 vs posd[3747194-1]:2342476
-> problems with unsorted saf file chromoname: 'VZRF01003198.1' pos[1781647]:47 vs posd[1781647-1]:2204817
-> problems with unsorted saf file chromoname: 'VZRF01003198.1' pos[3563294]:47 vs posd[3563294-1]:2204817
-> problems with unsorted saf file chromoname: 'VZRF01003225.1' pos[1424298]:37 vs posd[1424298-1]:1837532
-> problems with unsorted saf file chromoname: 'VZRF01003225.1' pos[2848596]:37 vs posd[2848596-1]:1837532
-> problems with unsorted saf file chromoname: 'VZRF01003403.1' pos[1750153]:161 vs posd[1750153-1]:2304690
-> problems with unsorted saf file chromoname: 'VZRF01003403.1' pos[3500306]:161 vs posd[3500306-1]:2304690
-> problems with unsorted saf file chromoname: 'VZRF01003559.1' pos[1591756]:65 vs posd[1591756-1]:2090553
-> problems with unsorted saf file chromoname: 'VZRF01003559.1' pos[3183512]:65 vs posd[3183512-1]:2090553
-> problems with unsorted saf file chromoname: 'VZRF01004612.1' pos[1150017]:8 vs posd[1150017-1]:1684003
-> problems with unsorted saf file chromoname: 'VZRF01004612.1' pos[2300034]:8 vs posd[2300034-1]:1684003
-> problems with unsorted saf file chromoname: 'VZRF01005175.1' pos[1475906]:6433 vs posd[1475906-1]:1754798
-> problems with unsorted saf file chromoname: 'VZRF01005175.1' pos[2951812]:6433 vs posd[2951812-1]:1754798
-> problems with unsorted saf file chromoname: 'VZRF01005414.1' pos[1216044]:215 vs posd[1216044-1]:1615746
-> problems with unsorted saf file chromoname: 'VZRF01005414.1' pos[2432088]:215 vs posd[2432088-1]:1615746
-> problems with unsorted saf file chromoname: 'VZRF01005562.1' pos[2384210]:1294 vs posd[2384210-1]:2875881
-> problems with unsorted saf file chromoname: 'VZRF01005562.1' pos[4768420]:1294 vs posd[4768420-1]:2875881
-> problems with unsorted saf file chromoname: 'VZRF01005657.1' pos[1572672]:426 vs posd[1572672-1]:2055607
-> problems with unsorted saf file chromoname: 'VZRF01005657.1' pos[3145344]:426 vs posd[3145344-1]:2055607
-> problems with unsorted saf file chromoname: 'VZRF01005754.1' pos[1419380]:145 vs posd[1419380-1]:1771275
-> problems with unsorted saf file chromoname: 'VZRF01005754.1' pos[2838760]:145 vs posd[2838760-1]:1771275
-> problems with unsorted saf file chromoname: 'VZRF01006295.1' pos[1595018]:201 vs posd[1595018-1]:2066701
-> problems with unsorted saf file chromoname: 'VZRF01006295.1' pos[3190036]:201 vs posd[3190036-1]:2066701
-> problems with unsorted saf file chromoname: 'VZRF01006584.1' pos[1307392]:8 vs posd[1307392-1]:1628002
-> problems with unsorted saf file chromoname: 'VZRF01006584.1' pos[2614784]:8 vs posd[2614784-1]:1628002
-> problems with unsorted saf file chromoname: 'VZRF01006704.1' pos[1312512]:99 vs posd[1312512-1]:1802706
-> problems with unsorted saf file chromoname: 'VZRF01006704.1' pos[2625024]:99 vs posd[2625024-1]:1802706
-> problems with unsorted saf file chromoname: 'VZRF01006819.1' pos[1766192]:128 vs posd[1766192-1]:2791916
-> problems with unsorted saf file chromoname: 'VZRF01006819.1' pos[3532384]:128 vs posd[3532384-1]:2791916
-> problems with unsorted saf file chromoname: 'VZRF01006837.1' pos[1645839]:240 vs posd[1645839-1]:2190617
-> problems with unsorted saf file chromoname: 'VZRF01006837.1' pos[3291678]:240 vs posd[3291678-1]:2190617
-> problems with unsorted saf file chromoname: 'VZRF01007136.1' pos[1587237]:35 vs posd[1587237-1]:1956362
-> problems with unsorted saf file chromoname: 'VZRF01007136.1' pos[3174474]:35 vs posd[3174474-1]:1956362
-> problems with unsorted saf file chromoname: 'VZRF01007591.1' pos[1554923]:35 vs posd[1554923-1]:2101834
-> problems with unsorted saf file chromoname: 'VZRF01007591.1' pos[3109846]:35 vs posd[3109846-1]:2101834
These problem sites are all near the ends of the chromosomes, possibly the last SNP on each chromosome.
Here is my ANGSD version:
-> angsd version: 0.935 (htslib: 1.11) build(Apr 19 2021 15:46:47)
Finally, in case it's helpful, here is my rf file. It includes the first and last bases in a number of scaffolds.
VZRF01000187.1 1 2229693
VZRF01000586.1 1 1920239
VZRF01000927.1 1 2036130
VZRF01001110.1 1 3829561
VZRF01001586.1 1 2702612
VZRF01001720.1 1 2793601
VZRF01001859.1 1 2234040
VZRF01001867.1 1 1819343
VZRF01002078.1 1 1627592
VZRF01002081.1 1 1731244
VZRF01002083.1 1 1772795
VZRF01002164.1 1 3986751
VZRF01002170.1 1 1635836
VZRF01002270.1 1 1770293
VZRF01003085.1 1 2342532
VZRF01003198.1 1 2205074
VZRF01003225.1 1 1837724
VZRF01003403.1 1 2304910
VZRF01003559.1 1 2090620
VZRF01004612.1 1 1684394
VZRF01005175.1 1 1754815
VZRF01005414.1 1 1615767
VZRF01005562.1 1 2875882
VZRF01005657.1 1 2055830
VZRF01005754.1 1 1771318
VZRF01006295.1 1 2066764
VZRF01006584.1 1 1628026
VZRF01006704.1 1 1802776
VZRF01006819.1 1 2792923
VZRF01006837.1 1 2190622
VZRF01007136.1 1 1957350
VZRF01007591.1 1 2101985
Let me know what I might be able to try to fix this, thanks!
-Teresa
Hi everybody,
I've been trying to calculate Fst between two populations, but I got stuck at the first step: obtaining the saf files.
I start with a vcf file (coming from GATK and a subsequent filtering through bcftools), and I use ANGSD -doSaf. This step seems to go fine, but later on realSFS complains that my file is not sorted. The minimal example I could propose is actually not minimal, because if I subset my vcf into a smaller region I don't get the error any longer. I can reproduce the error on single sample and multi-sample vcfs though.
Here are my commands:
angsd runs with no errors in STDERR.
realSFS check
reports a list of:When I check in myvcf.saf.idx.table I see that indeed the positions on PeexChr4 (and later) are not sorted correctly, eg:
But actually, if I check in the vcf file, PeexChr4 has no position
159957355
, but rather this is the last position of PeexChr3, while10882
is the first position for PeexChr4.I cannot figure out if I'm doing something wrong, and how to fix it. I'll appreciate any help and suggestion!
Marta