Closed shandiya closed 2 months ago
The good news is that galah 2.0.0 has fixed this issue (yay)
However, there is a row limit set internally to make slice_head()
and arrange()
functions work correctly in complex queries. This limit of 30 rows is (at the moment) opaque to the user.
What this means in this case is that running the first query without setting a higher limit using atlas_counts(limit = )
will return 120 rows. This is because each each region in reg
will be limited to only 30 rows but the full year range in the query is 50.
library(galah)
reg <- c("Gibson Desert",
"Little Sandy Desert",
"Southern Volcanic Plain",
"Flinders Lofty Block")
# IBRA then year (with no limit)
ibra_year <- galah_call() |>
galah_filter(cl1048 == reg,
year >= 1971,
year <= 2020) |>
galah_group_by(cl1048, year) |>
atlas_counts()
ibra_year
#> # A tibble: 120 × 3
#> cl1048 year count
#> <chr> <chr> <int>
#> 1 Southern Volcanic Plain 2020 319082
#> 2 Southern Volcanic Plain 2018 238959
#> 3 Southern Volcanic Plain 2019 232903
#> 4 Southern Volcanic Plain 2017 182618
#> 5 Southern Volcanic Plain 2015 180192
#> 6 Southern Volcanic Plain 2016 169479
#> 7 Southern Volcanic Plain 2014 120798
#> 8 Southern Volcanic Plain 2011 102496
#> 9 Southern Volcanic Plain 2013 86669
#> 10 Southern Volcanic Plain 2012 78345
#> # ℹ 110 more rows
# IBRA then year (with a high limit)
ibra_year <- galah_call() |>
galah_filter(cl1048 == reg,
year >= 1971,
year <= 2020) |>
galah_group_by(cl1048, year) |>
atlas_counts(limit = 1000)
ibra_year
#> # A tibble: 199 × 3
#> cl1048 year count
#> <chr> <chr> <int>
#> 1 Southern Volcanic Plain 2020 319082
#> 2 Southern Volcanic Plain 2018 238959
#> 3 Southern Volcanic Plain 2019 232903
#> 4 Southern Volcanic Plain 2017 182618
#> 5 Southern Volcanic Plain 2015 180192
#> 6 Southern Volcanic Plain 2016 169479
#> 7 Southern Volcanic Plain 2014 120798
#> 8 Southern Volcanic Plain 2011 102496
#> 9 Southern Volcanic Plain 2013 86669
#> 10 Southern Volcanic Plain 2012 78345
#> # ℹ 189 more rows
Created on 2023-12-22 with reprex v2.0.2
As a temporary fix to avoid this unexpected internal limit, the limit has been increased to 10,000 on the dev
branch and a message will now appear if you happen to hit that limit (which should be very rare).
A proper fix will involve figuring out how to avoid the need to set a limit internally for slice_head()
and arrange()
to work
When
galah_group_by()
is used with more than one variable, different numbers of rows are returned if the order of variables is changed.galah version 1.5.1
To Reproduce
Expected behaviour The same number of rows should be returned irrespective of grouping order, with the only difference being the order of columns in the returned tibble.