elimuinformatics / vcf2fhir

vcf2fhir: a utility to convert VCF files into HL7 FHIR format for genomics-EHR integration
Apache License 2.0
37 stars 24 forks source link

Conversion method should return the converted data, and rely on another method to write data onto a file #20

Open rhdolin opened 3 years ago

rhdolin commented 3 years ago

Currently vcf2fhir converts and exports the HL7 FHIR format data to a json file. The converted json data for all the records exists in memory till it is exported in the end.

Evaluation Required: In memory storage required for FHIR json in case of very big VCF file conversion.

VCF files are sometimes expected to be in the size of GB's, it is better to write the converted FHIR json format for each record to file instead of in memory before moving to the next record. Major complexity in doing this is handling phase relationship json blocks which spans across multiple records.

Other Options:

  1. Throw Exception if In memory json blob reaches near maximum capacity allowed by system instead of Heap dump.
  2. Update the Readme file to notify used to provide the conversion region which converts only limited records in case of very big VCF file.
theanmolsharma commented 3 years ago

I would like to work on this issue. Please guide me through it.

srgothi92 commented 3 years ago

This particular issue would require lots of understanding and design before moving to implementation. Personally, in my opinion we would not like to fix this unless someone really wants it.

theanmolsharma commented 3 years ago

Okay. I will work on some other beginner-friendly issue.