fgvieira / ngsF-HMM

Estimation of per-individual inbreeding tracts under a probabilistic framework
GNU General Public License v3.0
13 stars 6 forks source link

Plotting IBD tracts #2

Closed yuisato closed 6 years ago

yuisato commented 6 years ago

Hello! I am having troubles getting IBD tracts (.ibd files) plotted using 'ngsF-HMMplot.R' mostly because I don't seem to be able to find enough documentation... I am trying to feed what the R-code takes as options, as specified in Lines 69-82 (https://github.com/fgvieira/ngsF-HMM/blob/master/scripts/ngsF-HMMplot.R), but so far no luck. Could you indicate some specifications that each argument needs, especially, '--pos', '--path', '--geno', and '--plot_sites'. I am assuming that '--in_file' is the .ibd output from ngsF-HMM, at least! Thank you for your help!!

fgvieira commented 6 years ago

Hi,

you can get some help if you run: Rscript --vanilla --slave scripts/ngsF-HMMplot.R -h

but yes, some descriptions are not very helpful. I'll try to explain them a bit more: This one is the input IBD file:

-i IN_FILE, --in_file=IN_FILE
    Path to input IBD file [NULL]

This is a file with the coordinates for each SNP (e.g. chr1\t3232; one per line):

--pos=POS
    Path to file with SNP positions/coordinates [NULL]

This last three are optional and add extra info to the plots. If you know the genotypes at each position, you can also plot them with this option:

-g GENO, --geno=GENO
    Path to file with genotypes (if available) [NULL]

you can also specify some sort of path (IBD state) to compare if, for example, you've also tried other methods:

-p PATH, --path=PATH
    Path to file with known/true paths (if available) [NULL]

If set, this option plot marks the position of the SNPs in the plot:

--plot_sites
    Plot position of SNPs? [FALSE]

You can also find an example here. Hope it helps,

yuisato commented 6 years ago

Thank you very much for the details! It really helped and I am getting plots now.

However, I am still getting error message below when tried with '--plot_sites': "Error in xy.coords(x, y) : 'x' and 'y' lengths differ Calls: iter_plot -> points -> points.default -> plot.xy -> xy.coords Execution halted" Can you think of a reason how this happens?

Also, struggling with --geno. Can this script take a '-.geno' output file from ANGSD? What format should this be (Do I need to choose specific '-doGeno' option?)?

Many thanks for your help!!!

fgvieira commented 6 years ago

not quite sure why you are getting that error... what does your sites file look like? It should be something like (one row per SNP):

chr1    1212312
chr1    2342342
chr2    2321343

the geno format is like (one row per SNP, one col per inidv):

0       0       0       0       0       0       2       0       0       1
0       2       0       0       0       0       0       0       0       1
0       0       0       2       0       0       2       1       0       0
0       0       1       0       0       0       0       2       2       0
1       2       0       0       0       0       2       0       0       0
0       0       0       0       2       1       0       0       0       1
0       2       1       0       2       1       0       1       0       0
1       0       1       0       0       1       0       0       0       1
0       0       1       0       0       1       0       1       0       1

so you can use the doGeno output with some small changes.

Keep in mind that these options were implemented for debugging purposes and are not really needed to plot the output from ngsF-HMM. They plot a lot of extra info and, in most cases, won't be easy to interpret (unless you have a small region).

either way, let me know if it worked. cheers,

yuisato commented 6 years ago

Thank you for the further info! I think my files' format was not right.. I will test it out and update as soon as I am able to (out of office until a week's time). Much appreciated!!

yuisato commented 6 years ago

Sorry for my long silence but here is an update: it all worked beautifully! IBD-regions are in purple, marginal probabilities in green lines, SNP-site positions in dark-blue ticks, genotypes in light blue spots.

I was using wrong formats for the 'pos' and 'geno' input files. Now I modified ANGSD -.geno.gz output files using 'doGeno 3', and made sure the formats were as you advised. Thank you very much!

fgvieira commented 6 years ago

No problem, glad it worked!