-
Hello,
I am trying to install VariantSpark on a Centos 7 box, jdk 1.8, scala 2.3.1, spark 2.1.1. When I do a mvn clean install, the following test is failing. Could you please help me resolve this?
…
-
When using VariantSpark Interface for Hail, a large batch size could lead to a crash in the process.
For example for the following setup a batch size of 250 result in failure (tested several times) w…
-
Create an automated way to spin up a Spark cluster on AWS with VariantSpark installed so researchers can get started easily.
This extends the work done by Lynn Langit as detailed over the followin…
-
Steps to reproduce:
Train a model using e.g. ImportanceCmd:
`$./bin/variant-spark --local -- importance -if data/chr22_1000.vcf -ff data/chr22-labels.csv -fc 22_16051249 -rn 10 -rbs 10 -om target/…
-
The "Biallelic" option in the current version allows for two different representations of variants in the output file.
- CHR_POS
- CHR_POS_REF_ALT
I was wondering if this option is extended to …
-
VariantSpark can output RandomForest model in JSON format.
That would be nice to have a command-line tool to be able to quickly look into the JSON model and list variables in each branch or tree.
Th…
-
At present only VCF files are exposed using the scala API for ingesting feature data. It would be useful to allow easy ingestion of parquet files as this would broaden the usefulness of VariantSpark b…
-
-
Hi,
I am currently testing your tool to cluster a large number of individuals according to their ethnicity or phenotype. I used the "importance" option but this allows to give importance to the var…
-
Hi,
I just started to try out VariantSpark with Scala using its featureImportance function on my own vcf file. Which worked perfectly as I followed the example code from the notebook.
But I would …