GlobalFishingWatch / gfwr

R package for accessing data from Global Fishing Watch APIs
https://globalfishingwatch.github.io/gfwr/
Apache License 2.0
58 stars 7 forks source link

Maximum 10,000 (Random?) Vessels #86

Closed adeloera closed 1 year ago

adeloera commented 1 year ago

I have been trying to construct fishing trips for all the vessels in the data (like in Selig et al 2022) but can't figure out how to form a loop to extract all the vessel IDs because it seems that the "get_vessel_info" command only returns a maximum 10000 observations. It also seems that the 10000 observations it draws are not consistent over time. That makes it very hard to write a loop that can cover, for example, all the vessels in the chinese fleet, since there seem to be more than 10000 vessel IDs for many geartypes. Anything I can do? Or could the command either let me extract more than 10000 at once or let me specify that I want a consistent "first" 10000, then "second" 10000, etc?

natemiller commented 1 year ago

Hi @adeloera. Can you provide some more context on how you are using the get_vessel_info function so I can better explore the issue? I'm not certain, but sounds as though you might be using the get_event function to identify a set of vessels and then perhaps you are submitting them to the get_vessel_info function to get vessel identity information? Is this correct? Are you submitting a string of 10000 vessel_ids along the lines of the following example?

get_vessel_info(query = 
                  "8c7304226-6c71-edbe-0b63-c246734b3c01,6583c51e3-3626-5638-866a-f47c3bc7ef7c,71e7da672-2451-17da-b239-857831602eca", 
                search_type = 'id', 
                key = key)

If you can provide a little more context or your workflow I can consider options

adeloera commented 1 year ago

Hi Nate!

Thanks for the response! Basically I am using the get_vessel_info function to get vessel IDs, which I subsequently feed the get_event function to extract the events done by those vessels (is the opposite workflow possible? i.e. to extract all fishing events without giving vessel IDs?).

The code that gives me a problem is the following: vessels <- get_vessel_info( query = "flag = 'CHN' AND geartype = 'fishing'", search_type = "advanced", dataset = "fishing_vessel")

This code will only extract 10000 vessels despite there being (i believe) more than 10000 vessels in the data with flag = CHN and geartype = fishing.

Happy to send all my code over if that would help!

Best, Andres ᐧ

On Fri, Oct 7, 2022 at 12:16 PM natemiller @.***> wrote:

Hi @adeloera https://github.com/adeloera. Can you provide some more context on how you are using the get_vessel_info function so I can better explore the issue? I'm not certain, but sounds as though you might be using the get_event function to identify a set of vessels and then perhaps you are submitting them to the get_vessel_info function to get vessel identity information? Is this correct? Are you submitting a string of 10000 vessel_ids along the lines of the following example?

get_vessel_info(query = "8c7304226-6c71-edbe-0b63-c246734b3c01,6583c51e3-3626-5638-866a-f47c3bc7ef7c,71e7da672-2451-17da-b239-857831602eca", search_type = 'id', key = key)

If you can provide a little more context or your workflow I can consider options

— Reply to this email directly, view it on GitHub https://github.com/GlobalFishingWatch/gfwr/issues/86#issuecomment-1271791705, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHWKBCAOSV3KC3BDNRGOPHDWCBEGZANCNFSM6AAAAAAQ7V64ZM . You are receiving this because you were mentioned.Message ID: @.***>

natemiller commented 1 year ago

@adeloera Thanks for the details. We have done a bit of checking and this appears to be a bit of an issue in how elastic search and pagination are working (or not) within the APIs. The GFW engineers have opened a ticket and are exploring the best solutions. I'm sorry that the current limit is 10,000 (set by elastic search), but once we have implemented a solution I'll update this issue.

adeloera commented 1 year ago

Sounds great, thanks!

adeloera commented 1 year ago

Trying to extract fishing events without any vessel id attached is currently giving me this error: image

A bit odd since previously it had begun to download (just stalled out since there are so many events...)

natemiller commented 1 year ago

I'd be surprised if that worked without any filters (though I haven't tried). That is large dataset to pull in one go.

adeloera commented 1 year ago

I think I may have some other more general problem since even the sample code is not working for me right now despite working fine recently. Possibly related to trying to redownload gfwr to make sure I had the newest version?

image

I imagine this is a problem on my end then but figured I'd flag it just in case it's familiar.

natemiller commented 1 year ago

That code works for me and returns 35 rows. I might try reinstalling the gfwr package. You could try this

devtools::install_github("GlobalFishingWatch/gfwr@1b0ae74")
adeloera commented 1 year ago

Thanks! Unfortunately, I just did that and am still getting the same error. Are there maybe other libraries I need loaded in as well? I'm confused since this is a new error on all my "get_events" calls--my same code used to run fine.