Closed boshek closed 4 years ago
There might be some sort of threshold to the startIndex
as this works:
river <- bcdc_query_geodata('freshwater-atlas-linear-boundaries') %>%
tail(3)
river$query_list$startIndex <- 2000
collect(river)
One option is to modify tail
like this:
tail.bcdc_promise <- function(x, n = 6L, ...) {
number_of_records <- bcdc_number_wfs_records(x$query_list, x$cli)
sorting_col <- pagination_sort_col(x$cols_df)
x$query_list <- c(
x$query_list,
count = n,
sortBy = sorting_col,
startIndex = number_of_records - n
)
if (x$query_list$startIndex > 2000) stop("tail not available for large records", call. = FALSE)
x
}
Interesting. Does the change to bcdc_number_wfs_records
have an impact on messages/print methods (i.e., if it short-cuts to the count parameter, when it says "This data set has n
records, showing only the first 6", does n
change? I can't remember if it's used there...
I don't think so. It is just head/tail that modify that message which is correct. So it still results in something like this:
R> bcdc_query_geodata('hydrometric-stations-active-and-discontinued') %>%
head(3)
Querying 'hydrometric-stations-active-and-discontinued' record
* Using collect() on this object will return 3 features and 17 fields
* At most six rows of the record are printed here
--------------------------------------------------------------------------------
Simple feature collection with 3 features and 17 fields
geometry type: POINT
dimension: XY
bbox: xmin: 1021765 ymin: 1304767 xmax: 1054676 ymax: 1384676
projected CRS: NAD83 / BC Albers
# A tibble: 3 x 18
id HYDROMETRIC_STA~ STATION_NUMBER FEATURE_CODE STATION_NAME FLOW_TYPE WATERSHED_GROUP~ WATERSHED_ID STREAM_ORDER ARCHIVE_URL REALTIME_URL STATION_OPERATI~
<chr> <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 WHSE~ 2082661 07EA001 CF29300000 FINLAY RIVE~ NATURAL TOOD 208 NA https://wa~ NA DISCONTINUED
2 WHSE~ 2082662 07EA002 CF29300000 KWADACHA RI~ NATURAL FOXR 53 NA https://wa~ NA DISCONTINUED
3 WHSE~ 2082663 07EA004 CF29300000 INGENIKA RI~ NATURAL INGR 69 NA https://wa~ https://wat~ ACTIVE-REALTIME
# ... with 6 more variables: CAPTURE_SCALE <chr>, START_DATE <date>, END_DATE <date>, OBJECTID <int>, SE_ANNO_CAD_DATA <chr>, geometry <POINT [m]>
R> bcdc_query_geodata('hydrometric-stations-active-and-discontinued')
Querying 'hydrometric-stations-active-and-discontinued' record
* Using collect() on this object will return 2306 features and 17 fields
* At most six rows of the record are printed here
--------------------------------------------------------------------------------
Simple feature collection with 6 features and 17 fields
geometry type: POINT
dimension: XY
bbox: xmin: 955923.4 ymin: 1055014 xmax: 1019837 ymax: 1159110
projected CRS: NAD83 / BC Albers
# A tibble: 6 x 18
id HYDROMETRIC_STA~ STATION_NUMBER FEATURE_CODE STATION_NAME FLOW_TYPE WATERSHED_GROUP~ WATERSHED_ID STREAM_ORDER ARCHIVE_URL REALTIME_URL STATION_OPERATI~
<chr> <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 WHSE~ 2082784 08EC008 CF29300000 MORRISON RI~ NATURAL BABL 5 NA https://wa~ NA DISCONTINUED
2 WHSE~ 2082785 08EC009 CF29300000 FULTON RIVE~ NATURAL BABL 5 NA https://wa~ NA DISCONTINUED
3 WHSE~ 2082786 08EC010 CF29300000 BABINE LAKE~ NATURAL BABL 5 NA https://wa~ NA DISCONTINUED
4 WHSE~ 2082787 08EC011 CF29300000 BABINE LAKE~ NATURAL BABL 5 NA https://wa~ NA DISCONTINUED
5 WHSE~ 2082788 08EC012 CF29300000 BABINE LAKE~ NATURAL BABL 5 NA https://wa~ NA DISCONTINUED
6 WHSE~ 2082789 08EC013 CF29300000 BABINE RIVE~ NATURAL BABR 6 NA https://wa~ https://wat~ ACTIVE-REALTIME
# ... with 6 more variables: CAPTURE_SCALE <chr>, START_DATE <date>, END_DATE <date>, OBJECTID <int>, SE_ANNO_CAD_DATA <chr>, geometry <POINT [m]>
Ok, great. There is a failing test unrelated to this PR - should we fix it here? (i.e., do you mind doing it? 😜 ). Looks like a column we were selecting before no longer exists in the data... (https://github.com/bcgov/bcdata/pull/212/checks?check_run_id=766044034#step:11:143)
I think I didn't quite get what you did here before, but I think you nailed it. The tail
issue is strange, so I guess your solution works in the interim?
Regarding the tail issue, I ran the code you posted and it worked for me. So it may be somewhat flaky but possible to leave as is?
> dh <- bcdc_query_geodata('2af1388e-d5f7-46dc-a6e2-f85415ddbd1c') %>%
tail(3) %>%
collect()
Authorizing with your stored API key
> dh
Simple feature collection with 3 features and 17 fields
geometry type: LINESTRING
dimension: XY
bbox: xmin: 1491824 ymin: 518350.2 xmax: 1521546 ymax: 553147.1
CRS: 3005
# A tibble: 3 x 18
id LINEAR_FEATURE_… WATERSHED_GROUP… EDGE_TYPE WATERBODY_KEY BLUE_LINE_KEY WATERSHED_KEY
* <chr> <int> <int> <int> <int> <int> <int>
1 WHSE… 832660866 78 1700 329216616 356564053 356564053
2 WHSE… 832660830 78 1700 329217730 356445345 356445345
3 WHSE… 832659811 78 1700 328941657 356566942 356566942
# … with 11 more variables: FWA_WATERSHED_CODE <chr>, LOCAL_WATERSHED_CODE <chr>,
# WATERSHED_GROUP_CODE <chr>, DOWNSTREAM_ROUTE_MEASURE <chr>, LENGTH_METRE <dbl>,
# FEATURE_SOURCE <chr>, FEATURE_CODE <chr>, OBJECTID <int>, SE_ANNO_CAD_DATA <chr>,
# FEATURE_LENGTH_M <dbl>, geometry <LINESTRING [m]>
This PR fixes a bug that caused paginated requests to fail with
head
. For example this now works:by telling the
bcdc_number_wfs_records
function to look out for a count parameter and use that if it exists.However, I am unable to make tail work. This does not currently work:
As far as I can see, the only difference is that a
tail
"query" includes astartIndex
query parameter. I haven't yet been able to figure this out.tail
does work for a smaller query like this: