I just started to try out VariantSpark with Scala using its featureImportance function on my own vcf file. Which worked perfectly as I followed the example code from the notebook.
But I would also like to use randomForest on a fraction of my data and predict on heldout data. I saw in your source code that there are functions to do this but, im a bit confused as to where to find proper function documentation (for the scala functions).
(trying to apply normal ml workflow: splitting the dataset, train the model, assess performance on the test set. Im not sure what is already included in the importanceAnalysis function).
Would be glad to get some help
Hi,
I just started to try out VariantSpark with Scala using its featureImportance function on my own vcf file. Which worked perfectly as I followed the example code from the notebook. But I would also like to use randomForest on a fraction of my data and predict on heldout data. I saw in your source code that there are functions to do this but, im a bit confused as to where to find proper function documentation (for the scala functions). (trying to apply normal ml workflow: splitting the dataset, train the model, assess performance on the test set. Im not sure what is already included in the importanceAnalysis function). Would be glad to get some help
Fiona