CornellLabofOrnithology / auk

Working with eBird data in R
https://CornellLabofOrnithology.github.io/auk/
GNU General Public License v3.0
137 stars 22 forks source link

Problems with AWK #50

Closed kaeli-mueller closed 3 years ago

kaeli-mueller commented 3 years ago

Hello, I'm new to using eBird data in R and I am following this tutorial: https://cornelllabofornithology.github.io/ebird-best-practices/ebird.html#ebird-extract to learn how to make abundance maps. I am in chapter 2.2 of this tutorial and I am getting stuck on compiling all filters with auk_filter() to make the two large output files. I am getting the error: "auk_filter() requires a valid AWK install, unless execute = FALSE". I thought this may have been because I shouldn't have run the if(!file.exists()) part of the code, but even when I don't run that I get errors in the next section. I also looked at other similar problems people had with AWK and so I added "auk_set_awk_path("C:/cygwin64/bin/gawk.exe/bin/gawk.exe", overwrite=TRUE)" but this didn't help either. I really appreciate any suggestions people have for solving this error!

I attached an image of my code to help explain the problem better. image

kaeli-mueller commented 3 years ago

Hello again, I think that previous error worked itself out, but now I am in the same place with new errors. The errors I am getting now say "Error in auk_filter.auk_ebd(ebd_filters, file = f_ebd, file_sampling = f_sampling) : Output directory doesn't exist".

I don't know exactly what this is referring to. I thought it might have been related to this step: "

output files

data_dir <- C:/Users/Kaeli/Documents/HANPP/data/species/eBird/eBird_abundance/data/" if (!dir.exists(data_dir)) { dir.create(data_dir) }

But I checked in my file explorer and there is a new folder there named data! So I know it exists, but I'm not sure why my computer can't find it. I don't know what previous line of code could be causing the hold up. The steps to create the ebd and sampling files run, but I don't think they run correctly because the tutorial says that they should take a long time to run, and also they don't give any output to my R environment.

I have google this problem and found examples of checks for the files, but those aren't of much use if my computer doesn't even think the files exist. Here is a picture of my R code now.

screenshot_ebird_error

Any help is much appreciated!

mstrimas commented 3 years ago

I believe the problem is that C:/Users/Kaeli/Documents/HANPP/data/species/eBird/eBird_abundance/data/ exists, but you're putting the file in a subdirectory of that, which probably doesn't exist. Instead try

data_dir <- "C:/Users/Kaeli/Documents/HANPP/data/species/eBird/eBird_abundance/data/ebd_US_wesmea_relJan-2021"
dir.create(data_dir)
f_ebd <- file.path(data_dir, "ebd_US_wesmea_relJan-2021.txt")
f_sed <- file.path(data_dir, "ebd_sampling_relJan-2021.txt")

Also, I notice you're using a pre-filtered EBD (US, Western Meadowlark), which means the filtering will be MUCH faster that dealing with the whole EBD. Just make sure in your set of filters you've also specified to only extract records from the US, this will ensure that the sampling file matches the pre-filtered EBD.

Let me know if this works!

kaeli-mueller commented 3 years ago

Thanks for the suggestion, I knew it had to be something with the filepath! So I made those changes and ran the code and it did give me some errors but also some new files were created. So a new subfolder called "ebd_US_wesmea_relJan-2021" was created in my data folder. And now there are two new files in here but they have the name names as the initial large files (ebd_US_wesmea_relJan-2021.txt and ebd_sampling_relJan-2021.txt) so I don't know if these new files have the correct filters applied. I think the observation dates are in the range I put in the filters so that is a good sign!

I guess I thought that this step would be creating two new R dataframes called f_ebd and f_sampling. But for me it produced some character strings called f_ebd and f_sampling, that is just the character string to the .txt files. snip_r_values

I'm doing the next few steps and this seems to be working but let me know if this is the right output for this step in the process!

mstrimas commented 3 years ago

This is correct, the filtering is done outside of R and it creates two text files. They have the same name as the input files because you gave them that name, to have different names, you could do something like

f_ebd <- file.path(data_dir, "ebd_US_wesmea_relJan-2021_filtered.txt")
f_sed <- file.path(data_dir, "ebd_sampling_relJan-2021_filtered.txt")

You can then read these files in with read_ebd() and read_sampling() to get a data frame.