parquet Search Results - Githubissues

1000+ results
for parquet

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

datacoon/undatum #20

Parquet compression

It would be nice to have options for compression. Looks like there is no compression by default? ``` parq RS_2008-04.parquet # Metadata created_by: parquet-cpp-arrow version 7.0.0 n…

chapmanjacobd updated 1 week ago
3
OSGeo/gdal #11309

GeoParquet fails to reads hive partioned data from Azure

### What is the bug? According to [docs](https://gdal.org/en/latest/drivers/vector/parquet.html#dataset-partitioning-read-support) and [ogr_parquet.py](https://github.com/OSGeo/gdal/blob/master/aut…

iferencik updated 2 days ago
1
walkerke/pygris #3

parquet in cache

once the files are read into memory, what do you think about caching them as parquet files instead of shapefiles? would make IO much faster and the footprint a lot smaller on disk. If you're into it, …

knaaptime updated 1 week ago
5
duckdb/duckdb #14819

HTTP redirects are URL decoded unexpectedly

### What happens? I am using httpfs to query parquet files from Hugging Face. Hugging Face initially returns a 302 redirect for the download file which DuckDB tries to follow. However DuckDB URL d…

danclaytondev updated 1 week ago
2
CrunchyData/pg_parquet #67

Cannot copy string column: table expected "Utf8" but file ha…

When I'm trying to copy data from local Parquet file to the database I get the following error: ``` type mismatch for column "bla" between table and parquet file. table expected "Utf8" but file h…

oliora updated 2 weeks ago
2
NVIDIA/NeMo-Curator #381

Graceful handling when no LSH duplicates found.

In the current implementation, the `__call__` method of `nemo_curator/modules/fuzzy_dedup.py`, it assumes that at least one LSH duplicate will be found, and the results will be saved as a parquet file…

davzoku updated 3 days ago
2
Eventual-Inc/Daft #3389

Swordfish performance issues for large machines with lots of…

### Describe the bug Our ML pipelines run on large instances in k8s cluster often large number of cpus is available 128/96 cpus. Looks like the new swordfish runtime has issues handling and schedul…

michaelvay updated 16 hours ago
1
Altinity/parquet-regression #5

Add support for low-level interface in JSON schema

https://github.com/apache/parquet-java/blob/master/parquet-column/src/test/java/org/apache/parquet/column/impl/TestColumnReaderImpl.java#L55

Selfeer updated 1 week ago
1
showlab/Show-o #21

No module named 'parquet.parquet_dataset'

File "Show-o/parquet/refinedweb_dataset.py", line 20, in from parquet.parquet_dataset import CruiseParquetDataset ModuleNotFoundError: No module named 'parquet.parquet_dataset'

mrswang1 updated 2 months ago
2
duckdb/duckdb #14910

to_timestamp function is 5x slower than make_timestamp

### What happens? I'm aggregating 4B records with duckdb. ``` SELECT channel_id, cast(to_timestamp(end_time) as date) AS day, sum(end_time - start_time) AS total_…

maver1ck updated 4 hours ago
2

上一页 1...4 5 6 7 8 9 10...100 下一页

1000+ results for parquet

1000+ results
for parquet