-
I'm thinking about how to use this in my business to bring people into databricks instead of using low code platforms that are more difficult to support.
One limitation I see today is that you can o…
-
-
I can't use Parquet.jl because there is a problem reading Date-typed columns. They are reading into Julia DataFrames as an Int32 -- I'm pretty sure parquet files are supposed to define the schema and …
-
This is to collaborate on some issues with Spark RF also addressed by @jkbradley in comments to this post http://datascience.la/benchmarking-random-forest-implementations/ (see comments by Joseph Brad…
-
## Background
Hi! I'm not an expert on COBOL/EBCDIC data structures, but I'm implementing a CDC scenario using Flink (in java), and I'd have some binary field to decode, given a playbook.
In the …
-
## Describe the task
The objective is to integrate Great Expectations into our Python ETL pipeline to ensure data quality. The task involves researching various integration methods, documenting the…
-
Hi. First of all. Thanks for your fantastic package.. It allows me to validate my spark dataframes and create beautiful reports.
I'm trying to use scan_data function over spark data frame. But I g…
-
-
https://github.com/h2oai/h2o-3/blob/c0f9ffef3b68e4727b1efe36c1c1111850519ee9/h2o-py/h2o/frame.py#L2542
requires both index and columns to be pivoted to be of following data types:
"enum","time",…
-
my spark dataframe are as below:
+--------------------+---------+
| feature| label|
+--------------------+---------+
|[-5395.3890376257...|[0.0,1.0]|
|[6.69571816328211...|[1.0,0.0…