-
### Describe the bug, including details regarding any error messages, version, and platform.
I'm trying to read a parquet file with pandas using 'pyarrow' engine and I'm having a problem while read…
-
## Description
Pandas DataFrame saved and loaded using pandas.DeltaTableDataset differs from the original DataFrame - it has additional column `__index_level_0__`. This is an Index saved as a column,…
-
### Is there an existing issue for this?
- [ ] I have searched the existing issues
- [ ] I have checked [#657](https://github.com/microsoft/graphrag/issues/657) to validate if my issue is covered by …
-
I was debugging some overture GeoParquet data fetching with @Bonkles and we were running into some exceptions saying
```
Content-Length Header missing from response
```
This seems to be coming…
-
### User Story:
As the Passport team,
I want to have unlimited and cost-effective access to Ethereum transaction data,
So that we can efficiently run analyses and query the blockchain without incurrin…
-
**Describe the issue**:
Hi
I encountered this error, and don't know what happened under the hood. Therefore, I open it for better tracking.
I have some spatial datasets in parquet format with…
-
### Description
For data with high cardinality that is the most optimal choice to partition by, it'd be nice if we could have ranges instead of strict equality in hive partitions. I don't think pyarr…
-
### Describe the bug, including details regarding any error messages, version, and platform.
#### Environment
OS: Windows/Linux
Python: 3.10.10
s3fs: from 2022.7.1 to 2023.3.0 (doesn't matter)…
-
Now that our support for SQL-on-FHIR-v2 ViewDefinition is complete (#821 and #916) we should do some large scale comparisons of the Spark+Parquet based approach with relational DB based ones using mat…
-
Task is to perform the harmonisation on GWAS Catalog summary statistics synched from the EBI FTP.
The full size of the dataset (on 2024-10-14) was:
```
❯ gsutil ls 'gs://gwas_catalog_inputs/raw_summa…