Intel-HLS / GenomicsDB

GenomicsDB
Other
111 stars 28 forks source link

Segfault when importing VCF #123

Closed jackgoldsmith4 closed 7 years ago

jackgoldsmith4 commented 7 years ago

Hi. I'm working on the Hail Team at the Broad Institute, and I was trying to import a VCF into GenomicsDB, but it caused a segfault. Here is the VCF file that I tried to import. Attached are the three JSON files for this VCF. The error message that I got is below:

ubuntu@ip-172-31-23-20:~/build_dir/tools$ vcf2tiledb /home/ubuntu/build_dir/jsonFiles/loader_config_file.json


[[19760,1],0]: A high-performance Open MPI point-to-point messaging module was unable to find any relevant network interfaces:

Module: OpenFabrics (openib) Host: ip-172-31-23-20

Another transport will be used instead, although this may result in lower performance.


[ip-172-31-23-20:04729] Process received signal [ip-172-31-23-20:04729] Signal: Segmentation fault (11) [ip-172-31-23-20:04729] Signal code: Address not mapped (1) [ip-172-31-23-20:04729] Failing at address: 0x1488780 [ip-172-31-23-20:04729] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f4582f36390] [ip-172-31-23-20:04729] [ 1] vcf2tiledb[0x507b29] [ip-172-31-23-20:04729] [ 2] vcf2tiledb[0x4d9fd2] [ip-172-31-23-20:04729] [ 3] vcf2tiledb[0x4eca80] [ip-172-31-23-20:04729] [ 4] vcf2tiledb[0x4eddd8] [ip-172-31-23-20:04729] [ 5] vcf2tiledb[0x48d704] [ip-172-31-23-20:04729] [ 6] vcf2tiledb[0x490c7f] [ip-172-31-23-20:04729] [ 7] /usr/lib/x86_64-linux-gnu/libgomp.so.1(+0x16dfe)[0x7f458336fdfe] [ip-172-31-23-20:04729] [ 8] /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f4582f2c6ba] [ip-172-31-23-20:04729] [ 9] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f4582c623dd] [ip-172-31-23-20:04729] End of error message Segmentation fault (core dumped)

JSON: callsets.txt vid_mapping_file.txt loader_config_file.txt

kgururaj commented 7 years ago

Ran the loader and query with the input files you provided. Made some corrections to the vid JSON. hail_vid.txt

However, there are issues with the sample VCF.

jackgoldsmith4 commented 7 years ago

I fixed the first two issues with the VCF, and I fixed the vid file. The import worked, thanks!

kgururaj commented 7 years ago

The query/read fails because of issue 3 in the imported VCF.

I noticed that you import a VCF with multiple samples into GenomicsDB. Is that the expected mode of operation for Hail i.e. data will be imported from VCF file(s) where each file contains many (>1K) samples? Or is the common mode that you will have multiple VCF files each containing data for a single sample?

danking commented 7 years ago

Hail won't generate genomics db files directly. The Data Sciences Data Engineering group at the Broad plans to deliver genomics db files to the Hail team instead of VCFs. Currently, they deliver VCFs containing a great number of samples.

Hail will import the genomics db file into our in-memory representation on which our users can write execute their analytical pipelines.

kgururaj commented 7 years ago

Can this issue be closed?