fgvieira / ngsF

Estimation of per-individual inbreeding coefficients under a probabilistic framework
Other
20 stars 8 forks source link

How to enable reading gz output from angsd --glf 3 #23

Closed ghost closed 1 year ago

ghost commented 1 year ago

I generated my genotype likelihood file using the following flags in ANGSD,

# generating the genotype likelihood files
"angsd -b "$BAMS" \
-ref "$REFERENCE_FASTA" \
-doMajorMinor 4 -doMaf 1 -doCounts 1 \
- "$chr" -P 10 \
-skipTriallelic 1 \
-minMapQ 25 -minQ 25 -remove_bads 1 \
-GL 1 -doGlf 3 \
-setMinDepth 1 -SNP_pval 1e-6 \
-out "$out"

but I get the following error when attempting to run my input through ngsF. Reading the front page of the github repository indicates that uncompressed is the default but compressed files are still accepted. Is there a flag I am missing from my ngsF script? The manual seems to indicate it can take both compressed and uncompressed GLF files.

# error
'[main] ERROR: Standard library only supports UNCOMPRESSED GLF files!

# ngsF
~/bigdata/ngsF/ngsF -glf out.glf.gz --n_ind "$number_indvs" --n_sites "$sites\
-init_values r --min_epsilon 1e-9 --n_threads 15 --out "$out"
fgvieira commented 1 year ago

To use BGZip files you need to remove the comment on line (as stated on the README):

https://github.com/fgvieira/ngsF/blob/b327c1437e45223cd4386c0fbc30b8f12fa37c4b/Makefile#L17

Why are you interested in this option? It is mainly implemented for very large datasets, to reduce memory usage. Is that your case?

ghost commented 1 year ago

Thank you for your response Dr. Vieira. I was having difficulties decompressing the file. If you have any suggestions on how to, I would greatly appreciate them.

ghost commented 1 year ago

I am not sure the .gz output from -glf 3 is bgzip. I tried to enable the flag as you indicated from the README and I was met with this.

[main] ERROR: BGZF library only supports BGZIP files!

I had already attempted to "gunzip" the file and it remained in binary which is why I thought it was a bgzip file. I can gunzip the other files produced by ANGSD v0.937 as a results of my current script: .glf.pos.gz and the .mafs.gz produced by my ANGSD run to generate the genotype likelihoods.

ghost commented 1 year ago

I realized my mistake. The glf 3 output is still a binary. I was expecting to able to see nonbinary after gunzip-ing it. My apologies. Thank you for your time Dr. Vieira.