-
## Background
Currently, Atum relies on the global state of a Spark Application. This complicates the usage of Atum for jobs that are slightly more complicated than just a pipeline of a single dataf…
-
Hi
I am trying to load a csv zip file from google cloud into BQ, file size is 100 GB but the load is taking lot of time,
is there a way to tune the df.saveAsBigQueryTable command to spe…
-
**Is your feature request related to a problem? Please describe.**
Hello, I am new to Pyspark and data engineering in general. I am looking to validate a Pyspark Dataframe given a schema. Came across…
-
test with our code & data
-
**Describe the bug**
I am not sure what could be the reason , but the indexing process seems to be entering a loop for the following code until it completes successfully
```
val blockingFields…
-
**Is your feature request related to a problem? Please describe.**
I'd like to deprecate **Microsoft.Data.Analysis** from this project, or at least move it out of **Microsoft.Spark** to a distinct …
-
These extension have already been added to [r2rml model extensions](https://github.com/SmartDataAnalytics/r2rml-api-jena/blob/develop/r2rmlx-jena-api/src/main/java/org/aksw/r2rmlx/domain/api/Constrain…
-
### Description
I love the lazy reading for Iceberg. It's great.
If Polars support writes back to an Iceberg catalog, that would make it a really powerful too working alongside the sql engines and…
-
Hello,
I am starting with Frameless and I am having a hard time converting my code based on spark-Dataframes to the Frameless framework.
The blocking point I reach now is how to override a column.
…
-
When trying to create a Spark DataFrame from an Eland Dataframe, I get the following error :
`KeyError: 'Requested column [0] is not in the DataFrame.'`
I tried renaming/filtering out columns wit…