compbiocore / VariantVisualization.jl

Julia package powering VIVA, our tool for visualization of genomic variation data. Manual:
https://compbiocore.github.io/VariantVisualization.jl/stable/
Other
82 stars 13 forks source link

Question about maximum VCF file size #87

Closed jhchung closed 4 years ago

jhchung commented 4 years ago

Hi,

I am wondering what the maximum file size (number of variants and number of individuals) Viva can handle.

Also, is there a way to use a bgzip/gzip compressed vcf file?

Thanks, -Jonathan

gtollefson commented 4 years ago

Hi Jonathan @jhchung ,

Good questions! You can use bgzip/gzip compressed VCF files with VIVA without any special commands. The VCF reader should automatically unzip them.

The maximum VCF file size that is able to be run using VIVA is determined by your computing resources, rather than any limitations of VIVA. This is because VIVA reads through the VCF file line with a 'low memory footprint' by reading the VCF variant records line and only saving the VCF variant records that match your query (ex. genomic range) to local memory. In this way it can extract data from very large VCF files (tens of Gb's).

However, if your query returns too many variant records, there will be too much data to plot. In our experience, on a standard 2015 Macbook Pro, we can visualize around 2000 variants for ~200 patients at a time before any noticeable strain on the computer hardware. However, we recommend limiting the number of variants to visualize since the number of data visible is limited by the number of pixels on your screen. Since the PlotlyJS.jl Julia package which VIVA utilizes is very efficient, it allows you to plot thousands of variants across hundreds of samples (millions of data points) but it often isn't pragmatic to do so.

gtollefson commented 4 years ago

I'm going to close this now. Please feel free to make a new issues request with any other questions you have in the future!