Extract IBD lengths or a coordinate file from ibd file

mpalmada commented 1 year ago

Hi,

My name is Marc, I would like to know if is there any available script to extract the IBD track lengths and/or coordinates per individual from an ibd file.

Thanks a lot,

Marc

fgvieira commented 1 year ago

You can get it from the .ibd output file (See https://github.com/fgvieira/ngsF-HMM#output-files).

mpalmada commented 1 year ago

Hi,

I understand that the IBD track lengths and coordinates are in the .ibd output file. I don't know how to get a bed file with the coordinates of IBD tracks by individual. I found your "convert_ibd.pl" in the scripts folder but I don't know how to create the input file from the .ibd output file.

Any tips? Thanks very much for the fast response and being this responsive!

Marc

mpalmada commented 1 year ago

Hi again,

Maybe I should change my question. How should I read the .ibd output file? And is it a script available that transforms the .ibd file into a .bed file or similar?

Thank you so much!

Marc

fgvieira commented 1 year ago

Hi @mpalmada,

ibd is a text file, where the first line stores per-individual log-likelihoods. The following lines (one per individual) contain the most-probable inbreeding tracts coded as an INT per site (0: position is not IBD; 1: position is IBD), followed by the IBD posterior probabilities as a FLOAT per site (one line per individual).

Since it is a text file, you can read it directly with any program or script. If you want to convert the ibd file to bed, you can use the script/convert_ibd.pl script as:

perl scripts/convert_ibd.pl --ibd_pos <ngsF-HMM.ibd> --pos <sites_file.pos> --ind <sample.labels>

or, in case you don't care about sample labels:

seq -f "Ind_%g" 1 $N_INDS | perl scripts/convert_ibd.pl --ibd_pos <ngsF-HMM.ibd> --pos <sites_file.pos>

where ngsF-HMM.ibd is the ibd output from ngsF-HMM, sites_file.pos the positions file provided to ngsF-HMM (--pos), and sample.labels, the label of each sample.

But double check your output, as the script has not been thoroughly tested!

mpalmada commented 1 year ago

Hi @fgvieira,

I tried and couldn't do it, do you have any mock files to have a look at the .pos and sample.labels files?

Thanks a lot for your help,

Marc

fgvieira commented 1 year ago

If you do perl scripts/convert_ibd.pl -h you get a small help:

    perl file_utils.pl [-h] --ind /path/to/ind_file --pos/path/to/pos_file [--ibd_pos /path/to/ibd_pos | --ibd_bed /path/to/ibd_bed]
    OPTIONS:
       --help       This help screen
       --ind        File with individual information (IND_ID on first column)
       --pos        TSV file with genomic coordinates (CHR,POS)
       --ibd_pos    File with IBD state per position (one indiv per line, 0/1 per site)
       --ibd_bed    FILE with IBD state per region in BED format (CHR,START,END,IND_ID). If IND_ID is blank or '*', interval is assumed for all individuals

Position file is

chr1\tpos1
chr1\tpos2

and labels file

sample1
sample2

mpalmada commented 1 year ago

Hi @fgvieira,

Sorry for not reaching out earlier, I want to do the transformation the other way around: from .ibd file to .bed file!

Hope you have a script that does this,

Thanks a lot,

Marc

fgvieira commented 1 year ago

Yes, that is what the script does!

mpalmada commented 1 year ago

Hi @fgvieira,

I have tried and get this message a lot of times:

Argument "genome_1_s00026" isn't numeric in numeric ne (!=) at convert_ibd.pl line 126, line 2.

Also, I have an ibd file with multiple samples and only get coordinates of one sample and only for some chromosomes!

I even get some coordinates for which start and end positions are switched (I have a larger start position) and I don't know how to interpret those.

Any ideas? Thanks a lot for your responses so far!

Marc

mpalmada commented 1 year ago

Okay I think the chromosome names have to be numbers, I'll close this issue if the bed file looks good! Thanks a lot again @fgvieira!!!

mpalmada commented 1 year ago

It worked I didn't close because I forgot, thanks @fgvieira! Important to use chromosome names with only numbers in it!

fgvieira / ngsF-HMM

Extract IBD lengths or a coordinate file from ibd file #9