getian107 / SuSiEx

Cross-population fine-mapping
MIT License
26 stars 5 forks source link

convert existing LD matrix to bin4 format #14

Closed gunns2 closed 7 months ago

gunns2 commented 7 months ago

Hello,

Thanks so much for another great stat gen package! I'm hoping to run SuSiEx with LD reference taken from the UKBB LD block matrices, which are precomputed. I already have LD matrices for the loci I want to analyze in regular text format, as well as the upper triangular hail .bgz format. I think what makes the most sense is to try to convert the LD files that I have into the same format as the plink binary4 files that SuSiEx utilizes.

Do you have any insight into how these binary4 files are formatted and how to best go about converting? This seems like it should be possible but I'm running into some issues trying to figure out how to convert the files.

thanks so much!

Sophie

yorkklause commented 7 months ago

Hi Sophie,

Thank you for reaching out and for using our software! While I'm not familiar with the detailed structure of the Hail .bgz format, I can provide information about the plink binary4 file format, which is relatively straightforward. In the plink binary4 format, each number in a "full" LD matrix is stored using 4 bytes as it stored in memory.

I've uploaded a C++ script for converting a "full" LD matrix to the binary4 format. You can find it here: utilities/convert2bin.cpp

Once compiled, you can use it as follows:

zcat $path_to_the_gz_full_LD_matrix | ./convert2bin $path_to_the_output_file

If you have any further questions or need assistance, please feel free to ask. I'm here to help!

My best

Kai Yuan

gunns2 commented 7 months ago

Hi,

This worked great, thank you so much!

Sophie