ANGSD / angsd

Program for analysing NGS data.
231 stars 51 forks source link

RealSFS Watterson's Theta, Issue with size of dimension #530

Open msug0 opened 2 years ago

msug0 commented 2 years ago

Hi,

I am having a problem and I am not sure how to solve it. I am trying to calculate now the Thetas. My steps until now are:

Calculate SAF angsd -bam BamFiles.txt -doSaf 1 -out allsites.sfs -anc ReferenceGenome_final.fasta -GL 2 -minMapQ 1 -minQ 20

Calculate SFS realSFS allsites.sfs.saf.idx -nSites 1500000000 -P 30 > allsites.sfs *Here I had to use -nSites because my file was too big (I think)

The command I am using now is RealSFS saf2theta allsites.sfs.saf.idx -sfs allsites.sfs -outname theta

And the error: -> Version of fname:allsites.sfs.saf.idx is:2 -> Assuming .saf.gz file: allsites.sfs.saf.gz -> Assuming .saf.pos.gz: allsites.sfs.saf.pos.gz -> args: tole:0.000001 nthreads:4 maxiter:100 nsites:0 start:allsites.sfs chr:(null) start:-1 stop:-1 fstout:theta oldout:0 seed:1665234516 bootstrap:0 resample_chr:0 whichFst:0 fold:0 ref:(null) anc:(null) -> Will read chunks of size: 4096 -> Reading: allsites.sfs assuming counts (will normalize to probs internally) -> Pxroblem with size of dimension of prior 33 vs 66

Do you have any ideas?

Thanks a lot!

TeresaPegan commented 1 year ago

You might want to check you SFS file and see how many lines it has. I think that when you use -nSites with that command, it will make a separate SFS for every block of whatever number of sites you gave it. If you have 3 billion sites total and you use nSites 15 billion, it will make two distinct SFS's, one on each line of the output. You would have to add the two SFS's together to get the genome-wide SFS. Hope this helps!