-
When using VariantSpark interface for Hail to run important analysis, VariantSpark expect to have exactly one allele in REF and ALT field. If there was any issue (some dataset have . in the ALT field)…
-
VariantSpark currently only supports categorical labels. While binning continuous variables to use as multiple categories, is a workaround, fully supporting continuous labels would be preferable.
-
VariantSpark is currently optimised for reasonally small sample sizes (n=100-5000) and large numbers of variants (e.g. 42 million) , ie. 'wide' datasets. Working on phenotypes in UKBB, e.g. CAD we hav…
-
Hi,
New to Genomics. The description in Readme is very less to start with. Couldnt find any other details on net. Can you please provide us with detail description on how to use VariantSpark . Thank…
-
Hello all,
I am running example.conf file using Variant Spark shell script file.
I have changed the driver memory from 1 to 6 GB (default). For all ranges of GB (1-6GB), I am getting "Invalid initial…
-
An option for VariantSpark to search for optimal parameters using a grid search where users can provide a search grid with parameters to test. There is an implementation in Python in the scikit-learn …
-
Hi, The framework seems promising. Read the BMC genomics paper. There isn;t much documentation here in the repo. If you could add some documentation and also about ongoing/future work, it will be usef…
-
This is the remaining work from issue: #140
That is:
- Add command line option for predicing class probabilities
- Implementing command line predictions form JSON serialised model
- Adding op…
-
-
I recommend the following improvement to VariantSpark Random Forest importance analysis.
1. Compute and write importance score to a file after building every 1000 tree.
2. Automatically identify…