-
### Description
`batch` and `batch_map` do not work and it's not clear what the syntax should be.
Take this example:
```python
DataChain.from_storage(path="gs://dvcx-datalakes/dogs-and-cats/…
-
### Description
It works for like an hour and then fails:
```
$ python examples/wds.py
...
...
Generated: 327271 rows [12:19, 686.43 rows/s]
Generated: 327425 rows [12:19, 724.09 rows/s]
P…
-
@BennyThadikaran a feature request
can there be a generic mechanism for saving data that I can extend to write to other datalakes?
_Originally posted by @Prady04 in https://github.com/BennyThadik…
-
Hi Team, hopefully this is right place to ask, if not, I'd appreciate if you can direct me.
I'm the founder of [cloudquery.io](https://www.cloudquery.io/), a high performance open source ELT framew…
-
'm using rclone in a docker environment. I'm using a memory constrained environment and it looks like the OS is sending a kill signal because rclone is using too much memory. The problem is that rclon…
-
# Big Data file formats
The evaluation of the major data formats and storage engines for the Big Data ecosystem has shown the pros and cons of each of them for various metrics, in this post I'll try …
-
We recently added a new feature called vectored IO in Hadoop for improving read performance for seek heavy readers. Spark Jobs and others which uses parquet will greatly benefit from this api. Details…
-
**Describe what's wrong**
Selecting data from a partitioned DeltaLake table, resulting in an error stating the partition columns are not available
```sql
SELECT *
FROM deltaLake('http://localhos…
-
### Search before asking
- [X] I searched in the [issues](https://github.com/ververica/flink-cdc-connectors/issues) and found nothing similar.
### Motivation
Many sinks of datalakes and OLAP datab…
-
Hi Team, hopefully this is right place to ask, if not, I'd appreciate if you can direct me.
I'm the founder of [cloudquery.io](https://www.cloudquery.io/), a high performance open source ELT framew…