aehrc / VariantSpark

machine learning for genomic variants
http://bioinformatics.csiro.au/variantspark
Other
141 stars 45 forks source link

scala randomForest classifier #93

Closed fifdick closed 9 months ago

fifdick commented 5 years ago

Hi,

I just started to try out VariantSpark with Scala using its featureImportance function on my own vcf file. Which worked perfectly as I followed the example code from the notebook. But I would also like to use randomForest on a fraction of my data and predict on heldout data. I saw in your source code that there are functions to do this but, im a bit confused as to where to find proper function documentation (for the scala functions). (trying to apply normal ml workflow: splitting the dataset, train the model, assess performance on the test set. Im not sure what is already included in the importanceAnalysis function). Would be glad to get some help

Fiona

rocreguant commented 9 months ago

Are you still interested in this?

rocreguant commented 9 months ago

I'm closing the issue since we haven't heard back. Feel free to re-open with this or any new issue :)