-
## Background [Optional]
We have fixed length ebcdic file with file structure starts with header, body records and trailer. We user cobrix library (**za.co.absa.cobrix:spark-cobol_2.12:2.6.0**) in **…
-
Having to flatten nested schemas can be cumbersome. A function to flatten the structure with various options would be handy.
Some configurations to include could be:
- Include Parent Struct Nam…
-
## What is the current behavior?
Creating DataFrame / Executing sql statements on snowflake returns a `snowflake.snowpark.dataframe.DataFrame` by default without an option to convert it to a `spark…
-
Fractional split feature of `Splitter` returns an undesired result when one tries to split a `pandas` dataframe with duplicated indices without passing any argument for `id_column`.
The following …
-
I've been working with this library for a few days. Thanks so much for maintaining it! It has made it really easy to work with data from Snowflake using PyData libraries.
I'd like to propose a feat…
-
Currently fn:sum specifies the intent of the second parameter in a note:
> The second argument allows an appropriate value to be defined to represent the sum of an empty sequence. For example, when…
-
In trying to write from a spark dataframe to bigquery with dataproc (image version image 2.0.45-debian10) in [direct mode](https://github.com/GoogleCloudDataproc/spark-bigquery-connector#writing-data-…
-
**The problem**
We want to enable caching of functions and their downstream results.
Say we want to alter a function and rerun the entire DAG. The function that we want to alter runs late enough…
-
### Describe the feature
Support Dask just as Spark is supported.
### Who will this benefit?
This will benefit realtime / web-request use cases where milliseconds matter. The same isomorphic Mac…
-
Hello,
Posting this from github (master @wesm asked for it :) )
```java
import pandas as pd
import numpy as np
import pyarrow.parquet as pq
import pyarrow as pa
idx = pd.date_…