GlobalFishingWatch / gfwr

R package for accessing data from Global Fishing Watch APIs
https://globalfishingwatch.github.io/gfwr/
Apache License 2.0
58 stars 7 forks source link

Cannot pull data for certain EEZ code using gfwr::get_raster() #64

Closed julietcohen closed 2 years ago

julietcohen commented 2 years ago

I am using the Map Visualization API for a global assessment of apparent fishing effort for all regions (countries) in the GFW database. I’m using a nested loop to iterate through all region codes to extract their EEZ codes using get_region_id(), and then extracting apparent fishing effort within those EEZ’s using get_raster(). My loop consistently fails on CHN’s first EEZ code, 8486. This EEZ code as an input into get_raster() also returns the same error when run independently, outside of my nested loop. Here is the code for just CHN’s EEZ 8486, outside of the loop, and the error that’s returned:

chn_code_eez <- get_region_id(region_name = "CHN", region_source = 'eez', key = key)

chn_2012_2020_eez1 <- gfwr::get_raster(spatial_resolution = 'high', 
                                     temporal_resolution = 'yearly',
                                     group_by = 'flagAndGearType', 
                                     date_range = '2012-01-01,2020-12-31', 
                                     region = chn_code_eez$id[1], 
                                     region_source = 'eez',
                                     key = key)
image

This code also errors with EEZ code 8486 (manually inputting the EEZ code rather than indexing), and with 08486 and 84860 (tacking on a 0 to either end of the listed EEZ code returned by get_region_id()).

Additionally, while my nested loop works fine for extracting apparent fishing effort for the vast majority of regions and EEZ codes, it would be helpful if get_raster() could process multiple EEZ codes for the same region at once (as a list), rather than requiring the user to index for regions with multiple EEZ codes. For example, my nested loop outputs a .csv for each region and EEZ code pair, resulting in 8 files for Australia, since Australia has 8 EEZ codes. It would make my analysis smoother to export just 1 .csv for Australia that contains fishing effort for all its EEZ’s within the time frame specified.

Overall, I'm very happy with the API! It's been helpful to avoid manually downloading data from the GFW webpage.

natemiller commented 2 years ago

@julietcohen I'm not sure why you were receiving that error. Can you try running your example again? Also, instead of running if for the entire time range from 2012 to 2020, can you try a date range of just one year? I was able to run your Chinese EEZ example for 2020 and return 1383603 rows in about 65 seconds. In principle, the larger date range should work, but it may be slow to return results.

chn_code_eez <- get_region_id(region_name = "CHN", region_source = 'eez', key = key)

gfwr::get_raster(spatial_resolution = 'high', 
                         temporal_resolution = 'yearly',
                         group_by = 'flagAndGearType', 
                         date_range = '2020-01-01,2020-12-31', 
                         region = chn_code_eez$id[1], 
                         region_source = 'eez',
                         key = key)

I don't believe the APIs currently support querying multiple regions at a time, but I can check and it is certainly a feature we can pass along to the engineering team for future releases.

I understand the challenge of having several .csvs that represent a single EEZ. I'm not sure of your exact implementation, but perhaps you could implement a dplyr::bind_rows() to your list of dataframes before writing the .csv. Maybe, maybe not.

julietcohen commented 2 years ago

Hi Nate, thanks for looking into this. By subsetting the date range like you suggested, I was indeed able to retrieve the fishing effort data I'm looking for. That seems to imply that the API is overwhelmed by the amount of data pulled for this region over all years. This is certainly not a major issue for my analysis, since I can just manually pull the annual data for China and concatenate it outside of my nested loops. Since running this code for all years returns an error message that is not indicative of the actual source of the error, perhaps a more informative message would help others troubleshoot more efficiently moving forward.

I will add onto this issue if I encounter the same error for other regions' EEZ codes.

Thanks for your suggestion for using dplyr::bind_rows(). I have already implemented a different solution but that would work great as well!