Closed mpalmada closed 1 year ago
You can get it from the .ibd
output file (See https://github.com/fgvieira/ngsF-HMM#output-files).
Hi,
I understand that the IBD track lengths and coordinates are in the .ibd output file. I don't know how to get a bed file with the coordinates of IBD tracks by individual. I found your "convert_ibd.pl" in the scripts folder but I don't know how to create the input file from the .ibd output file.
Any tips? Thanks very much for the fast response and being this responsive!
Marc
Hi again,
Maybe I should change my question. How should I read the .ibd
output file? And is it a script available that transforms the .ibd
file into a .bed
file or similar?
Thank you so much!
Marc
Hi @mpalmada,
ibd
is a text file, where the first line stores per-individual log-likelihoods. The following lines (one per individual) contain the most-probable inbreeding tracts coded as an INT per site (0: position is not IBD; 1: position is IBD), followed by the IBD posterior probabilities as a FLOAT per site (one line per individual).
Since it is a text file, you can read it directly with any program or script. If you want to convert the ibd
file to bed
, you can use the script/convert_ibd.pl
script as:
perl scripts/convert_ibd.pl --ibd_pos <ngsF-HMM.ibd> --pos <sites_file.pos> --ind <sample.labels>
or, in case you don't care about sample labels:
seq -f "Ind_%g" 1 $N_INDS | perl scripts/convert_ibd.pl --ibd_pos <ngsF-HMM.ibd> --pos <sites_file.pos>
where ngsF-HMM.ibd
is the ibd
output from ngsF-HMM
, sites_file.pos
the positions file provided to ngsF-HMM
(--pos
), and sample.labels
, the label of each sample.
But double check your output, as the script has not been thoroughly tested!
Hi @fgvieira,
I tried and couldn't do it, do you have any mock files to have a look at the .pos and sample.labels files?
Thanks a lot for your help,
Marc
If you do perl scripts/convert_ibd.pl -h
you get a small help:
perl file_utils.pl [-h] --ind /path/to/ind_file --pos/path/to/pos_file [--ibd_pos /path/to/ibd_pos | --ibd_bed /path/to/ibd_bed]
OPTIONS:
--help This help screen
--ind File with individual information (IND_ID on first column)
--pos TSV file with genomic coordinates (CHR,POS)
--ibd_pos File with IBD state per position (one indiv per line, 0/1 per site)
--ibd_bed FILE with IBD state per region in BED format (CHR,START,END,IND_ID). If IND_ID is blank or '*', interval is assumed for all individuals
Position file is
chr1\tpos1
chr1\tpos2
and labels file
sample1
sample2
Hi @fgvieira,
Sorry for not reaching out earlier, I want to do the transformation the other way around: from .ibd file to .bed file!
Hope you have a script that does this,
Thanks a lot,
Marc
Yes, that is what the script does!
Hi @fgvieira,
I have tried and get this message a lot of times:
Argument "genome_1_s00026" isn't numeric in numeric ne (!=) at convert_ibd.pl line 126,
Also, I have an ibd file with multiple samples and only get coordinates of one sample and only for some chromosomes!
I even get some coordinates for which start and end positions are switched (I have a larger start position) and I don't know how to interpret those.
Any ideas? Thanks a lot for your responses so far!
Marc
Okay I think the chromosome names have to be numbers, I'll close this issue if the bed file looks good! Thanks a lot again @fgvieira!!!
It worked I didn't close because I forgot, thanks @fgvieira! Important to use chromosome names with only numbers in it!
Hi,
My name is Marc, I would like to know if is there any available script to extract the IBD track lengths and/or coordinates per individual from an ibd file.
Thanks a lot,
Marc