International-Soil-Radiocarbon-Database / ISRaD

Repository for the development and release of ISRaD data and tools
https://international-soil-radiocarbon-database.github.io/ISRaD/
24 stars 15 forks source link

Error with ISRaD.extra.geospatial.Zheng #200

Closed alkalifly closed 4 years ago

alkalifly commented 4 years ago

I'm getting an error from ISRaD.extra.geospatial.Zheng, it is the same error whether I try running ISRaD.extra() on it's own or whether it's part of the full ISRaD.build()

I've pasted the output from my terminal below, including the full backtrace. Is anyone else getting this error when building/running ISRaD.extra?


     filling 0.5 degree geospatial climate and soil data from Zheng Shi 
Error: `by` required, because the data sources have no common variables
Call `rlang::last_error()` to see a backtrace
> rlang::last_error()
<error>
message: `by` required, because the data sources have no common variables
class:   `rlang_error`
backtrace:
  1. ISRaD::ISRaD.build(...)
  2. ISRaD::ISRaD.extra(...)
  3. ISRaD::ISRaD.extra.geospatial.Zheng(database, geodata_soil_directory = geodata_soil_directory)
  5. dplyr:::left_join.data.frame(database$profile, USDA_0.5_key)
  8. dplyr:::left_join.tbl_df(...)
 10. dplyr:::common_by.NULL(by, x, y)
 11. dplyr:::bad_args("by", "required, because the data sources have no common variables")
 12. dplyr:::glubort(fmt_args(args), ..., .envir = .envir)
Call `rlang::last_trace()` to see the full backtrace
> rlang::last_trace()
     █
  1. └─ISRaD::ISRaD.build(...)
  2.   └─ISRaD::ISRaD.extra(...)
  3.     └─ISRaD::ISRaD.extra.geospatial.Zheng(database, geodata_soil_directory = geodata_soil_directory)
  4.       ├─dplyr::left_join(database$profile, USDA_0.5_key)
  5.       └─dplyr:::left_join.data.frame(database$profile, USDA_0.5_key)
  6.         ├─base::as.data.frame(...)
  7.         ├─dplyr::left_join(tbl_df(x), y, by = by, copy = copy, ...)
  8.         └─dplyr:::left_join.tbl_df(...)
  9.           ├─dplyr::common_by(by, x, y)
 10.           └─dplyr:::common_by.NULL(by, x, y)
 11.             └─dplyr:::bad_args("by", "required, because the data sources have no common variables")
 12.               └─dplyr:::glubort(fmt_args(args), ..., .envir = .envir)
> 
ShaneStoner commented 4 years ago

Hey Paul, can you tell me which folder you're using for your spatial data? I just talked to Marcus and it looks like we were linking to two different folders, one of which didn't have any of the "_Zheng.tif" files. I'm guessing this is the problem, since it should generate "pro_soilorder" column to join on. I'm currently updating the shared folder so that it works for Marcus, maybe that will fix it for you. Thanks! And sorry for the confusion.

jb388 commented 4 years ago

Hi Paul---I'm not sure this will help, but I just tweaked the code on that function yesterday. The dir argument for that function was set to geospatial_soil_dir_Zheng (or something like that), and I changed it to the same as the others. Additionally, I think some of the geospatial_soil files used to be in the geospatial_soil_tax directory, so maybe that's part of the issue? For what it's worth, I rebuilt ISRaD yesterday without seeing this issue.

jb388 commented 4 years ago

@alkalifly Is this fixed now?

alkalifly commented 4 years ago

@ShaneStoner @jb388 Thanks for the suggestions. I was definitely using the latest commit as of the time I posted the issue yesterday. So I think it may have had to do with the directory structure that Shane mentioned. I'm going to try again, with the latest updates, but it looks like there are now a bunch more required files (bulk density and C content), so it will take a bit of time before they finish downloading and I can try again. I'll keep you updated.

Shane, I'm assuming I should have the directory structure just like it is on keeper, and set geodata_clim_directory to wc2-5, geodata_pet_directory to pet, and geodata_soil_directory to ISRaD_extra_soiltax?

ShaneStoner commented 4 years ago

Hey gang,

Yeah, I didn't realize that there was a parallel ISRaD keeper that didn't require a password, and this was the one that everyone (and the main build function) would be using. I rearranged the data and everything should be good now.

@Paul: yes, that structure sounds/is right. All of the files related to soil are in the "soiltax" folder now. Which is about 70 GB... We can remove the old soil grids organic C layers, since they are inaccurate. The new versions of those files are included in the current function, which I renamed from "geospatial_clay" to just "geospatial_soil" and removed the old function.

Cheers,

Shane

On September 10, 2019 at 9:47 PM Paul Levine notifications@github.com wrote:

@ShaneStoner https://github.com/ShaneStoner @jb388 https://github.com/jb388 Thanks for the suggestions. I was definitely using the latest commit as of the time I posted the issue yesterday. So I think it may have had to do with the directory structure that Shane mentioned. I'm going to try again, with the latest updates, but it looks like there are now a bunch more required files (bulk density and C content), so it will take a bit of time before they finish downloading and I can try again. I'll keep you updated.

Shane, I'm assuming I should have the directory structure just like it is on keeper, and set
geodata_clim_directory to wc2-5, geodata_pet_directory to pet, and geodata_soil_directory to ISRaD_extra_soiltax?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/International-Soil-Radiocarbon-Database/ISRaD/issues/200?email_source=notifications&email_token=AJA4EY3IA2VTWP2IK2TD43TQI72W5A5CNFSM4IVATKXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6MI4TI#issuecomment-530091597 , or mute the thread https://github.com/notifications/unsubscribe-auth/AJA4EY2AN4J7DPKCV6KT5VLQI72W5ANCNFSM4IVATKXA .

-Shane Stoner

alkalifly commented 4 years ago

@ShaneStoner Thanks for clearing that up. Does that mean that there are files that are currently in there that are not necessary? Will you be removing the superfluous ones and/or could you provide a list of which ones actually are necessary? Thanks!

alkalifly commented 4 years ago

Okay, I got all of the new data downloaded, and I think everything is in the right place, but now I'm getting a different error:

     filling 0.5 degree geospatial climate and soil data from Zheng Shi 
Error: Column names `pro_SG_clay_00cm`, `pro_SG_clay_0cm`, `pro_SG_clay_0cm` must not be duplicated.
Use .name_repair to specify repair.
Call `rlang::last_error()` to see a backtrace
> rlang::last_error()
<error>
message: Column names `pro_SG_clay_00cm`, `pro_SG_clay_0cm`, `pro_SG_clay_0cm` must not be duplicated.
Use .name_repair to specify repair.
class:   `rlang_error`
backtrace:
  1. ISRaD::ISRaD.extra(...)
  2. ISRaD::ISRaD.extra.geospatial.Zheng(database, geodata_soil_directory = geodata_soil_directory)
  4. dplyr:::left_join.data.frame(database$profile, USDA_0.5_key)
  7. dplyr::tbl_df(x)
  9. tibble:::as_tibble.data.frame(data, .name_repair = "check_unique")
 10. tibble:::as_tibble.list(unclass(x), ..., .rows = .rows, .name_repair = .name_repair)
 11. tibble:::lst_to_tibble(x, .rows, .name_repair, col_lengths(x))
 12. tibble:::set_repaired_names(x, .name_repair)
 17. tibble:::repaired_names(names(x), .name_repair = .name_repair)
 18. tibble:::check_unique(new_name)
Call `rlang::last_trace()` to see the full backtrace
> rlang::last_trace()
     █
  1. └─ISRaD::ISRaD.extra(...)
  2.   └─ISRaD::ISRaD.extra.geospatial.Zheng(database, geodata_soil_directory = geodata_soil_directory)
  3.     ├─dplyr::left_join(database$profile, USDA_0.5_key)
  4.     └─dplyr:::left_join.data.frame(database$profile, USDA_0.5_key)
  5.       ├─base::as.data.frame(...)
  6.       ├─dplyr::left_join(tbl_df(x), y, by = by, copy = copy, ...)
  7.       └─dplyr::tbl_df(x)
  8.         ├─tibble::as_tibble(data, .name_repair = "check_unique")
  9.         └─tibble:::as_tibble.data.frame(data, .name_repair = "check_unique")
 10.           └─tibble:::as_tibble.list(unclass(x), ..., .rows = .rows, .name_repair = .name_repair)
 11.             └─tibble:::lst_to_tibble(x, .rows, .name_repair, col_lengths(x))
 12.               └─tibble:::set_repaired_names(x, .name_repair)
 13.                 ├─rlang::set_names(x, repaired_names(names(x), .name_repair = .name_repair))
 14.                 │ └─rlang:::set_names_impl(x, x, nm, ...)
 15.                 │   └─rlang::is_function(nm)
 16.                 │     └─rlang::is_closure(x)
 17.                 └─tibble:::repaired_names(names(x), .name_repair = .name_repair)
 18.                   └─tibble:::check_unique(new_name)
> 

Any ideas or suggestions?

ShaneStoner commented 4 years ago

Hey Paul this has occurred for me too. If you don't use a fresh version of the ISRaD data object (as in, if you've already run one of the functions that adds these columns on the "database" argument) then you get the duplication error. Let me know if that isn't it. I'll see if I can't write something to prevent it.

jb388 commented 4 years ago

Sounds like this issue has been fixed? Pending comment I'll close this in the next day or two.

ShaneStoner commented 4 years ago

This issue arose from inconsistent geospatial data directories which contained different files and file names. This has been consolidated, and for posterity I will include a link to the folder below. In the future, send an email request for that folder link. It's hosted by the MPI's digital library (Keeper).

https://keeper.mpdl.mpg.de/d/b0558de14c184ef08feb/