Error: length(readLines(x, 2)) not greater than 1

lime-n commented 4 years ago

I have been using individual bird datasets downloaded directly from eBird instead of the large file because of it being too large for my PC.

I am using the latest version of Auk.

I have been following e-bird-best-practices.

When using the code: ebd_zf <- auk_zerofill(f_ebd, f_sampling, collapse = TRUE)

I get the error: Error: length(readLines(x, 2)) not greater than 1

Here is a script of my entire code:

1. ebd <- auk_ebd("far-eastern_curlew-2000-2020.txt",  file_sampling = "ebd_sampling_relJun-2020.txt")
2. ebd_filters <- ebd %>% #change species name
  auk_species("Far Eastern Curlew") %>% 
  auk_country(country) %>%
  auk_date(date = c("2000-01-01", "2020-06-01")) %>% 
  auk_protocol(protocol = c("Stationary", "Traveling")) %>%  
  auk_complete()
data_dir <- "data"
if (!dir.exists(data_dir)) {
  dir.create(data_dir) }
3. f_ebd <- file.path(data_dir, "curlew.txt") #create a new sample name
f_sampling <- file.path(data_dir, "ebd_sampling2.txt")
4. if (!file.exists(f_ebd)) {
  auk_filter(ebd_filters, file = f_ebd, file_sampling = f_sampling) }
#change variable name
5. ebd_zf <- auk_zerofill(f_ebd, f_sampling, collapse = TRUE)

mstrimas commented 4 years ago

I notice you have ebd_sampling_relJun-2020.txt.gz, which suggests this file is compressed. Try uncompressing and running this on a file ending in txt. Let me know if that fixes it.

lime-n commented 4 years ago

I retrieved the .txt file and it increased from 4.8gb to 14gb. After using the code it only reduced to around 8.8gb, which I didn't expect considering that I filtered the data as shown from the code.

Using the .txt only file, R returned this error instead:

Error: cannot allocate vector of size 250.0 Mb
In addition: Warning message:
In choose_reader(reader) :
  read.delim is slow for large EBD files, for better performance install the readr or data.table packages.

My memory.limit() is 8048

I downloaded a dataset of Red Knots from eBird separately from eBird for years 2000-2020. And I downloaded the large sampling data, maybe you could repeat these steps including my code to see where I went wrong?

FIXED: I included further filters to the code, such as specifying the countries I needed. This then successfully reduced the data to 512MB

lime-n commented 4 years ago

I fixed this by adding further filters to the code above. This reduced the sampling file to 532MB.

My only concern is that for each bird dataset, all of the rows are the same?

On a further note, because I am using separate .txt files for bird data. Is there a faster way of creating filters with bird.txt files and sample.txt files together without having to recreate a new sample file at each code?

Because I would have to repeat the process and once I get to code 3, I would need to change the sample_file.txt name for each different species. If not, then an error appears mentioning that the checklists aren't the same.

mstrimas commented 4 years ago

You only need to filter the sampling data once, then you can you the same file in all your auk_zerofill() calls. The important thing is to ensure you apply the same set of filters (apart from species) to both the sampling data and observation data.

lime-n commented 4 years ago

Thanks! that worked.

I have a question about bcr.

None of my datasets have a bcr code, because all my datafiles are filtered by countries along the EAAF.

Is there an alternative code for implementing tiles on this basis?

mstrimas commented 4 years ago

BCRs are only in North America, if you don't have them you can either filter by a list of countries or you can use auk_bbox() to filter to a a geographic region using latitude and longitude.

CornellLabofOrnithology / auk

Error: length(readLines(x, 2)) not greater than 1 #43