Major issues and features addressed in this update
VariantSpark's python wrapper has been refactored to create Random Forest models from a standalone class
Previously, in the non-hail VariantSpark release, the model was initialised and trained from the context of importance analyses. This did not seem appropriate for supporting future releases
A new scala function was created to return a trained RandomForest model without using hail
The FeatureSource class, which provides wrapper functionalities for initialising genotype data for model training, has been moved to a standalone class
For better separation of concerns, this class is now imported to the core python wrapper
head(nrows, ncols) allows the first n rows and columns to be viewed as a pandas DataFrame
Major issues and features addressed in this update
VariantSpark's python wrapper has been refactored to create Random Forest models from a standalone class
python/varspark/rfmodel.py
python/varspark/core.py
python/varspark/__init__.py
src/main/scala/au/csiro/variantspark/api/GetRFModel.scala
A non-hail export model function was created
src/main/scala/au/csiro/variantspark/api/ExportModel.scala
The
FeatureSource
class, which provides wrapper functionalities for initialising genotype data for model training, has been moved to a standalone classhead(nrows, ncols)
allows the first n rows and columns to be viewed as a pandas DataFramepython/varspark/featuresource.py
python/varspark/core.py
src/main/scala/au/csiro/variantspark/input/FeatureSource.scala
Covariate support was extended
FeatureSource
wrapper class and are also of typeRDD[Feature]
, they also supporthead()
src/main/scala/au/csiro/variantspark/api/VSContext.scala
src/main/scala/au/csiro/variantspark/input/CsvStdFeatureSource.scala
src/main/scala/au/csiro/variantspark/input/UnionedFeatureSource.scala
python/varspark/lfdrvsnohail.py
Importance analyses were moved to a standalone python wrapper class
important_variables()
andvariable_importance()
are now returned as pandas DataFramesvariable_importance()
(required for Local FDR calculations)precision
supports rounding forvariable_importance()
normalized
indicates whether to normalise importances for both functionspython/varspark/importanceanalysis.py
python/varspark/core.py
src/main/scala/au/csiro/variantspark/api/ImportanceAnalysis.scala
src/main/scala/au/csiro/variantspark/api/AnalyticsFunctions.scala
Move lfdr file to non-hail python directory
python/varspark/hail/lfdrvs.py
python/varspark/lfdrvs.py
Updated all test cases according to the above changes
src/test/scala/au/csiro/variantspark/api
/CommonPairwiseOperationTest.scala
/ImportanceApiTest.scala
src/test/scala/au/csiro/variantspark/misc
/ReproducibilityTest.scala
/CovariateReproducibilityTest.scala
src/test/scala/au/csiro/variantspark/test
/TestSparkContext.scala
python/varspark/test
/test_core.py
/test_hail.py
/test_pvalues_calculation.py
src/test/scala/au/csiro/variantspark/work/hail
/HailApiApp.scala
Removed all files used exclusively in hail version
python/varspark/hail
__init__.py
context.py
hail.py
methods.py
plot.py
src/main/scala/au/csiro/variantspark/hail/methods
RFModel.scala
Removed hail installation from
pom.xml