Closed hongooi73 closed 5 years ago
Directly modifying RevoScaleR:::rxCheckSupportForDataSource
does not fix this:
# in Spark compute context
hd <- RxHdfsFileSystem()
# mtcars composite Xdf
mthc <- RxXdfData("/user/sshuser/mtcarsc", fileSystem=hd, createCompositeSet=TRUE)
# RxXdfData output works
mth2 <- RxXdfData("/user/sshuser/mtcarsc2", fileSystem=hd, createCompositeSet=TRUE)
rxDataStep(mthc, mth2)
rxHadoopRemoveDir("/user/sshuser/mtcarsc2")
# tbl_xdf output fails
mt_tbl <- as(mth2, "tbl_xdf")
rxDataStep(mthc, mt_tbl) # fails
#Error in rxCompleteClusterJob(hpcServerJob, consoleOutput, autoCleanup) :
#No job results retrieved.
#In addition:
#Warning message:
#In rxCompleteClusterJob(hpcServerJob, consoleOutput, autoCleanup) :
#Unable to retrieve output object(s).
#One of the following may have occured:
#Output may have already been retrieved and deleted
#The cluster operating system has not flushed the output files to disk yet
#Job may have failed.
#Job may have not generated output objects.
TFS item 78064
Use as_xdf
to wrap any tbl_xdf data sources
Specifically
else if (class(data) %in% c("RxTextData", "RxXdfData", "RxParquetData", "RxOrcData"))
Full function source: