-
> 认领须知
**提交的时候不要改动文件名称,即使它跟章节标题不一样也不要改,因为文件名和原文的链接是对应的!!!**
留言格式:翻译/校验 + 昵称 + QQ + 章节
需要取消认领的也在此留言。
| 序号 | 章节 | 贡献者 | 进度 | 校验者 | 进度 |
| --- | --- | --- | --- | --- | --- |
| 1 | [Spark …
-
## Is your feature request related to a problem? Please describe.
I love `snorkel.labeling.filter_unlabeled_dataframe()`. I want a pyspark equivalent: `snorkel.labeling.filter_unlabeled_spark_rdd` …
-
-
# Testing Plan
## Dummy Credit Card Application Dataset
### Test 1
- Read in each dataset into a dataframe
- time creating the dataframe for each
- Join the dataframes
- Filter out USA…
-
For the tasks of preprocessing `pandas` data and speeding up experiments, we have the `Preprocessor` class and a number of base classes with single functionality at [preprocessing](https://github.com/…
-
## Bug
#### Which Delta project/connector is this regarding?
- [x] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [ ] Other (fill in here)
### Describe the problem
When you load two…
-
## Feature Description
Spark Connect Support
## Is your feature request related to a problem?
In Spark Connect, RDD is not supported, so PipelineDP does not work. See https://github.com/apache/sp…
wchau updated
9 months ago
-
This error occured while Iwas trying to comapact all snappy.parquet files which were generated in Spark2.1 with DataFrames.
Is there any work around? Maybe to try with RDDs but how eficient it is
Ca…
-
We use Jupyter notebooks to access BigTable data like so:
```
from google.cloud import bigtable
from google.cloud import happybase
client = bigtable.Client(project=project_id, admin=True)
instanc…
-
Recently, parquet added support for columnar/modular encryption in version parquet-mr 1.12 ([IBM](https://www.ibm.com/docs/en/cloud-paks/cp-data/4.0?topic=scripts-parquet-encryption), [GitHub](https:/…