Open wul88 opened 1 year ago
Could you try running with --freq_est 1
?
Hi, fgvieira,
Thank you for the quick response. The option of "--freq_est 1" did run. It took over 3 days to finish. I am checking the inbreeding coefficient data now. Meanwhile I have some questions on these runs as welll as using ngsF-HMMplot.R.
Here is the command I used:
ngsF-HMM --geno ${glf2} --lkl --pos ${pos} --n_ind $NSAMS --n_sites $NSITES --freq 0.1 --freq_est 1 \
--indF 0.1,0.1 --out ${outDir}/${out} --n_threads 8 --seed ${thisSeed} \
--min_epsilon 1e-9 --verbose 2 --log 1
where:
thisSeed=$RANDOM
glf2="all118_MafGlf2_allChrs.beagle.gz"
pos="all118_MafGlf2_allChrs_noHeader.pos"
NSAMS=118
outDir=ngsFHMM/"rep_"$thisRep
thisRep=$rep # ($rep - number of replicate)
Rscript ../scripts/ngsF-HMMplot.R --in_file testF-HMM.$ID.$TYPE.ibd --n_ind $N_IND --n_sites $N_SITES --geno testF-HMM.SIM.geno.gz --path testF-HMM.SIM.path.gz --pos testF-HMM.SIM.pos.gz --marg_prob --out testF-HMM.$ID.$TYPE.pdf
I don't have a .path.gz file in my output. What is that file? How can I produce one? Thank you in advance for your help!
Hi @wul88 and sorry for the late reply.
--path
option just adds another track to the plots. I used it to show the "truth", but can also be used to plot some regions of interest. In your case, you can omit it.Hi, fgvieira, Thank you for your help! The plot script worked. I have over 100 samples, loading pdf of 15 pages becomes time consuming. Any suggestion on break-up result?
As running ngsF-HMM, I have following concerns. Please advise me.
I have ran with --indF "0.1,0.1"
or "r" twice. The results were the same. It seemed that initial values of indF does not have much effect on the final results. I checked the results from all replicate runs, they are the same. The random seeds also has no effect on the algorithm. Is this normal? Or I did something wrong? Here is the command I used:
nSite=$((`zcat $maf | tail -n+2 | wc -l`))
ngsF-HMM --geno ${glf2} --lkl --pos ${pos} --n_ind ${nInd} --n_sites ${nSite} --freq e --freq_est 1 \
--indF r --out ${outDir}/${out} --n_threads 8 --seed ${thisSeed} \
--min_epsilon 1e-9 --verbose 2 --min_iters 20 --max_iters 400
I noticed that the inbreeding coefficients in the output .indF file, were quite different from those from ngsF. In addition, the transition parameters for individuals in .indF file were all "10". What does that parameter mean? Is "10" reasonable?
with --log 1
did not produce .log file.
--indF "0.1-0.1"
ngsF-HMM
with the estimates from ngsF
as starting values? The transition parameter represents how often the state changes between IBD and notIBD; high numbers reflect short IBD fragments. What values of F
are you getting from ngsF
and ngsF-HMM
?X
iterations, but it overwrites the file (it is mostly for debug purposes).
Hello, fgvieira
I tried to run ngsF-HMM to find IBD of samples from different sub-populations. I used the Beagle GL files computed previously as below:
I have .beagle.gz and .mafs.gz files. I extracted first two columns from .mafs.gz file and removed the header, then saved as pos file. The nsites was computed by "cat $pos | wc -l " . The command to run ngsF-HMM is as follows:
WHERE glf2 is the Beagle format glf from angsd run shown at the beginning, and inF_transitPar_118AveAuto.txt is a two-column file: 1st column is the inbreeding coefficient from running ngsF and the second column is 0.1 as transition parameter. one row per individual. ( I also used 0.1, 0.1 as shown in tutorial in different run, received the same error.) AND thisSeed=$RANDOM glf2="all118_MafGlf2_allChrs.beagle.gz"
maf="all118_MafGlf2_allChrs.mafs.gz"
pos="all118_MafGlf2_allChrs_noHeader.pos" NSAMS=118 NSITES=$((
cat ${pos} | wc -l
))outDir=ngsFHMM/"rep_"$thisRep out="all118depthSetinFHMM"$thisRep ( thisRep is replicate number. I ran 10 replicates but every one had the same error.)
--- one of the .err files ---->
thisRep=2
thisSeed=18950
glf2=all118_MafGlf2_allChrs.beagle.gz
pos=all118_MafGlf2_allChrs_noHeader.pos
NSAMS=118
outDir=ngsFHMM/rep_2
NSITES=2205512
out=all118depthSet_inFHMM_2
ngsF-HMM --geno all118_MafGlf2_allChrs.beagle.gz --lkl --pos all118_MafGlf2_allChrs_noHeader.pos --n_ind 118 --n_sites 2205512 --freq 0.1 --freq_est 2 --indF inF_transitPar_118AveAuto.txt --out ngsFHMM/rep_2/all118depthSet_inFHMM_2 --n_threads 8 --seed 18950 --min_epsilon 1e-9 --verbose 2 --log 1
Could you please give me some advice as what I did wrong? Thank you very much in advance!