apache / arrow-nanoarrow

Helpers for Arrow C Data & Arrow C Stream interfaces
https://arrow.apache.org/nanoarrow
Apache License 2.0
174 stars 38 forks source link

Provide some way to suppress warning about unknown extension? #631

Closed hadley closed 1 month ago

hadley commented 1 month ago

e.g.

geography: Converting unknown extension google:sqlType:geography{string} as storage type
Backtrace:
    ▆
 1. ├─bigrquery::bq_table_download(tb, api = "arrow", quiet = TRUE) at test-bq-download.R:103:3
 2. │ └─bigrquerystorage::bqs_table_download(...) at bigrquery/R/bq-download.R:122:7
 3. │   ├─base::as.data.frame(nanoarrow::read_nanoarrow(raws)) at bigrquerystorage/R/bqs_download.R:111:3
 4. │   └─nanoarrow:::as.data.frame.nanoarrow_array_stream(nanoarrow::read_nanoarrow(raws))
 5. │     └─nanoarrow::infer_nanoarrow_ptype(x$get_schema())
 6. └─nanoarrow:::infer_ptype_other(`<nnrrw_sc>`)
 7.   ├─nanoarrow::infer_nanoarrow_ptype_extension(spec, schema)
 8.   └─nanoarrow:::infer_nanoarrow_ptype_extension.default(spec, schema)
 9.     └─nanoarrow:::warn_unregistered_extension_type(x)
paleolimbot commented 1 month ago

Great point!

Maybe not worth doing immediately since extension types are not that well tested in the wild; however, the built-in extension system should be able to handle converting geography columns to wk::wkt(geodesic = TRUE, crs=wk::wk_crs_longlat()).

meztez commented 1 month ago

Trying to figure out why the internal is not able to pick it up. Here is a raws vector to test out. raws.zip

a <- readRDS("raws.zip")
b <- as.data.frame(a)
Warning messages:
1: In warn_unregistered_extension_type(x) :
  datetime: Converting unknown extension google:sqlType:datetime{timestamp('us', '')} as storage type
2: In warn_unregistered_extension_type(x) :
  geography: Converting unknown extension google:sqlType:geography{string} as storage type
3: In warn_unregistered_extension_type(x) :
  b: Converting unknown extension google:sqlType:geography{string} as storage type
4: In warn_unregistered_extension_type(x) :
  geo: Converting unknown extension google:sqlType:geography{list<item: string>} as storage type
5: In warn_unregistered_extension_type(storage) :
  datetime: Converting unknown extension google:sqlType:datetime{timestamp('us', '')} as storage type
6: In warn_unregistered_extension_type(storage) :
  geography: Converting unknown extension google:sqlType:geography{string} as storage type
7: In warn_unregistered_extension_type(storage) :
  b: Converting unknown extension google:sqlType:geography{string} as storage type
8: In warn_unregistered_extension_type(storage) :
  geo: Converting unknown extension google:sqlType:geography{list<item: string>} as storage type
paleolimbot commented 1 month ago

Thank you for the reproducer!

After #632 this should be:

# Using pak::pak("apache/arrow-nanoarrow/r#632")
ipc_raw <- readr::read_rds("https://github.com/user-attachments/files/17082385/raws.zip")

options(nanoarrow.warn_unregistered_extension = FALSE)
ipc_raw |> 
  nanoarrow::read_nanoarrow() |> 
  tibble::as_tibble()
#> # A tibble: 1 × 15
#>   unicode datetime            logicaltrue logicalfalse     bytes date      
#>   <chr>   <dttm>              <lgl>       <lgl>           <blob> <date>    
#> 1 😃      2000-01-02 03:04:05 TRUE        FALSE        <raw 2 B> 2000-01-02
#> # ℹ 9 more variables: time <time>, timestamp <dttm>, geography <chr>,
#> #   s <df[,2]>, a <list<dbl>>, aos <list<df[,2]>>, soa <df[,2]>, bb <df[,2]>,
#> #   gg <df[,1]>

Created on 2024-09-20 with reprex v2.1.1

hadley commented 1 month ago

I also see warnings for google:sqlType:datetime{timestamp('us', '')}