aboyoun / BiocDuckDB

Bioconductor-friendly bindings for Parquet files.
MIT License
6 stars 1 forks source link

fail to read (or display) Xenium `transcripts.parquet` readouts #2

Closed Artur-man closed 1 week ago

Artur-man commented 2 weeks ago

From: https://www.10xgenomics.com/products/xenium-in-situ/preview-dataset-human-breast

metadata_par <- ParquetDataFrame("transcripts.parquet")
metadata_par
ParquetDataFrame with 42638083 rows and 8 columns
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'makeNakedCharacterMatrixForDisplay': the default type() method only supports array-like objects

It is successful using only arrows.

> metadata <- arrow::read_parquet("../../data/10X_Xenium_Visium/Xenium_R1/outs/transcripts.parquet", 
                                  as_data_frame = FALSE)
> metadata
Table
42638083 rows x 8 columns
$transcript_id <uint64>
$cell_id <int32>
$overlaps_nucleus <uint8>
$feature_name <binary>
$x_location <float>
$y_location <float>
$z_location <float>
$qv <float>

See $metadata for additional Schema metadata
aboyoun commented 1 week ago

@Artur-man I just pushed a fix to allow for raw and 64-bit integer columns in this version of ParquetDataFrame

> df <- ParquetDataFrame("~/TenX/outs/transcripts.parquet")
> df
ParquetDataFrame with 42638083 rows and 8 columns
                 transcript_id               cell_id      overlaps_nucleus            feature_name
         <ParquetColumnVector> <ParquetColumnVector> <ParquetColumnVector>   <ParquetColumnVector>
1              281474976710656                   565                     0                  SEC11C
2              281474976710657                   540                     0 NegControlCodeword_0502
3              281474976710658                   562                     0                  SEC11C
4              281474976710659                   271                     0                   DAPK3
5              281474976710660                   291                     0                    TCIM
...                        ...                   ...                   ...                     ...
42638079       281805689407068                    -1                     0                   HOXD8
42638080       281805689407071                135717                     1                     LUM
42638081       281805689407078                    -1                     0                     LUM
42638082       281805689407083                    -1                     0                    NARS
42638083       281805689407092                135716                     0                     LUM
                    x_location            y_location            z_location                    qv
         <ParquetColumnVector> <ParquetColumnVector> <ParquetColumnVector> <ParquetColumnVector>
1              4.3958420753479      328.666473388672      12.0194931030273      18.6624794006348
2             5.07441520690918          236.96484375      7.60851049423218      18.6349563598633
3             4.70202302932739      322.797149658203      12.2890825271606      18.6624794006348
4              4.9066014289856      581.428649902344      11.2226152420044      20.8217449188232
5             5.66069936752319      720.851745605469      9.26552295684814      18.0174884796143
...                        ...                   ...                   ...                   ...
42638079      5218.00048828125         5295.51953125      30.5609340667725      20.0610790252686
42638080          5218.8046875      4792.92822265625      38.4404258728027      21.3892784118652
42638081      5217.40478515625      5425.38623046875      30.5196018218994      21.9601020812988
42638082       5220.1552734375       4839.7041015625      32.1806983947754      25.9703979492188
42638083      5220.23583984375        4738.751953125      30.2087459564209      20.5241203308105

> packageVersion("ParquetDataFrame")
[1] ‘0.1.1’
Artur-man commented 1 week ago

Ah thanks so much @aboyoun !!!! thanks for the prompt fix ...