Closed yuisato closed 6 years ago
Hi,
you can get some help if you run: Rscript --vanilla --slave scripts/ngsF-HMMplot.R -h
but yes, some descriptions are not very helpful. I'll try to explain them a bit more: This one is the input IBD file:
-i IN_FILE, --in_file=IN_FILE
Path to input IBD file [NULL]
This is a file with the coordinates for each SNP (e.g. chr1\t3232; one per line):
--pos=POS
Path to file with SNP positions/coordinates [NULL]
This last three are optional and add extra info to the plots. If you know the genotypes at each position, you can also plot them with this option:
-g GENO, --geno=GENO
Path to file with genotypes (if available) [NULL]
you can also specify some sort of path (IBD state) to compare if, for example, you've also tried other methods:
-p PATH, --path=PATH
Path to file with known/true paths (if available) [NULL]
If set, this option plot marks the position of the SNPs in the plot:
--plot_sites
Plot position of SNPs? [FALSE]
You can also find an example here. Hope it helps,
Thank you very much for the details! It really helped and I am getting plots now.
However, I am still getting error message below when tried with '--plot_sites': "Error in xy.coords(x, y) : 'x' and 'y' lengths differ Calls: iter_plot -> points -> points.default -> plot.xy -> xy.coords Execution halted" Can you think of a reason how this happens?
Also, struggling with --geno. Can this script take a '-.geno' output file from ANGSD? What format should this be (Do I need to choose specific '-doGeno' option?)?
Many thanks for your help!!!
not quite sure why you are getting that error... what does your sites file look like? It should be something like (one row per SNP):
chr1 1212312
chr1 2342342
chr2 2321343
the geno format is like (one row per SNP, one col per inidv):
0 0 0 0 0 0 2 0 0 1
0 2 0 0 0 0 0 0 0 1
0 0 0 2 0 0 2 1 0 0
0 0 1 0 0 0 0 2 2 0
1 2 0 0 0 0 2 0 0 0
0 0 0 0 2 1 0 0 0 1
0 2 1 0 2 1 0 1 0 0
1 0 1 0 0 1 0 0 0 1
0 0 1 0 0 1 0 1 0 1
so you can use the doGeno output with some small changes.
Keep in mind that these options were implemented for debugging purposes and are not really needed to plot the output from ngsF-HMM. They plot a lot of extra info and, in most cases, won't be easy to interpret (unless you have a small region).
either way, let me know if it worked. cheers,
Thank you for the further info! I think my files' format was not right.. I will test it out and update as soon as I am able to (out of office until a week's time). Much appreciated!!
Sorry for my long silence but here is an update: it all worked beautifully! IBD-regions are in purple, marginal probabilities in green lines, SNP-site positions in dark-blue ticks, genotypes in light blue spots.
I was using wrong formats for the 'pos' and 'geno' input files. Now I modified ANGSD -.geno.gz output files using 'doGeno 3', and made sure the formats were as you advised. Thank you very much!
No problem, glad it worked!
Hello! I am having troubles getting IBD tracts (.ibd files) plotted using 'ngsF-HMMplot.R' mostly because I don't seem to be able to find enough documentation... I am trying to feed what the R-code takes as options, as specified in Lines 69-82 (https://github.com/fgvieira/ngsF-HMM/blob/master/scripts/ngsF-HMMplot.R), but so far no luck. Could you indicate some specifications that each argument needs, especially, '--pos', '--path', '--geno', and '--plot_sites'. I am assuming that '--in_file' is the .ibd output from ngsF-HMM, at least! Thank you for your help!!