PecanProject / bety

Web-interface to the Biofuel Ecophysiological Traits and Yields Database (used by PEcAn and TERRA REF)
https://www.betydb.org
BSD 3-Clause "New" or "Revised" License
16 stars 38 forks source link

Working with BETYdb from PEcAn virtual machine #487

Closed samsrabin closed 7 years ago

samsrabin commented 7 years ago

I'm new to all this and had a devil of a time figuring out how to access BETYdb using the PEcAn virtual machine. @dlebauer helped me out with the info I needed. It turns out there's no one place where this is all explained, so I'll do that in a reply to this issue.

samsrabin commented 7 years ago

I have only done this on a Mac. Here were my steps:

  1. Setting up the virtual machine:
    1. Download PEcAn virtual machine (.ova file) from here.
    2. Download and install VirtualBox.
    3. Open VirtualBox. If you want, go to the preferences and change your default machine folder. This will be about 14–15 GB, so make sure you select a drive with plenty of space.
    4. go to File > Import Appliance, and select the .ova file you downloaded. Continue.
    5. Check "Reinitialize the MAC address of all network cards" and press Import.
  2. Using the virtual machine:
    1. In the VirtualBox Manager window, select the PEcAn VM and press Start.
    2. A new window will pop up and boot Linux, eventually prompting you for a username. Enter "carya" (no quotes) and then "illinois" (again no quotes) for the password. You're now at the Linux command line.
    3. Open up Safari and go to localhost:6480/rstudio. Log in with the same username and password as above.
    4. You're in RStudio! Now you can start accessing BETYdb from here, for example by using dplyr.
dlebauer commented 7 years ago

@samsrabin thanks! Is the VM really 14-15 GB?

I'll note that the VM is a good place to start when using PEcAn. For just reading data, the R traits package can be used to connect to the live database (URL betydb.org) via the API https://github.com/terraref/tutorials/blob/master/traits/04-agronomic-metadata.Rmd

But Ill reopen until that is moved to the BETYdb data access documentation

samsrabin commented 7 years ago

Yep, I have the .vmdk file at 14.42 GB. Great point about the traits package, it's also good to have that here.

samsrabin commented 7 years ago

Unfortunately it doesn't seem like the traits package can do what that tutorial suggests anymore. betydeb_query() appears to have been removed. The closest it gets now is betyb_search(), which appears to basically just give the same output as searching via the web interface (without unchecked records).

dlebauer commented 7 years ago

@infotroph can you please check that https://github.com/terraref/tutorials/blob/master/traits/04-agronomic-metadata.Rmd is up to date?

infotroph commented 7 years ago

@samsrabin betydb_query still exists—in fact betydb_search is just an alias for betydb_query(search=query, table="search", ...)—but it's hard to discover that because betydb_query is documented on a different page from the rest of the bety functions. Can you confirm that ?betydb_query exists on your system and that the examples shown there run correctly for you?

Note that if you want unchecked records, you can pass include_unchecked=TRUE to either betydb_search or betydb_query, but the latter only makes sense for tables that contain a checked flag.

@dlebauer The tutorial is up to date with respect to the traits package, but the data access issues in https://github.com/terraref/tutorials/pull/7#issuecomment-278464313 still apply.

samsrabin commented 7 years ago

@infotroph Unfortunately I'm not getting anything for ?betydb_query. I have traits 0.2.0 and R 3.3.3 installed, in case that matters.

dlebauer commented 7 years ago

@samsrabin thanks for reporting the version. I think the issue is that recent changes haven't been released to cran (and its been almost a year since the 0.2.0 release). Could you try

library(devtools)
install_packages('ropensci/traits')

to get the most recent changes?

dlebauer commented 7 years ago

@sckott do you have a timetable for creating a new release of the traits package? I think @infotroph is done with his work and it would be great to have the version on CRAN using our new API.

But to be clear, the ease of the install_github function makes this a 'nice but not necessary to have'

samsrabin commented 7 years ago

When I do that (using install.packages because install_packages isn't found), I get the following:

Warning in install.packages :
  package ‘ropensci/traits’ is not available (for R version 3.3.3)
samsrabin commented 7 years ago

Ah, I found the code to get the dev version here:

if (!require("devtools")) install.packages("devtools")
devtools::install_github("ropensci/traits")
library("traits")

The version installed is 0.2.0.9410, which has betydb_query. Thanks!

samsrabin commented 7 years ago

So the examples in ?betydb_query all seem to work...

But it seems that there is an issue with the tutorial. Everything works great until the attempt to create the traits table, which gives me:

Error in UseMethod("select_") : 
  no applicable method for 'select_' applied to an object of class "NULL"

This actually shouldn't matter though, because the traits table isn't used for the rest of the tutorial.

What really matters is that the yields table is never created, giving an error when you attempt to create grass_yields.

There's some other weirdness, too. For example, the species has only one observation; managements has only five observations and the management_id and treatment_id variables are all 6e+09; planting has only two observations and the same problem with treatment_id; etc.

sckott commented 7 years ago

@dlebauer can push soon I think - i'll check over the pkg and the recent betydb work and get back to you.

sckott commented 7 years ago

@samsrabin and others:

is https://github.com/PecanProject/bety/issues/487#issuecomment-286785067 a problem with traits pkg?

dlebauer commented 7 years ago

@samsrabin thanks

some of the wierdness is due to the fact that the examples are from a different instance of BETYdb than betydb.org. The instance at terraref.ncsa.illinois.edu/bety currently only contains data for Sorghum breeding trials. But it is the only example where we've joined tables that had been queried using the traits package.

So there should be one species (Sorghum bicolor), two planting dates for two field seasons (as opposed to betydb.org where we've collated data from hundreds of experiments). The ids are all in the 6e+09 range because each betydb server gets its own set of one billion unique identifiers ... that way the different databases can share data without overwriting each other. (see docs https://pecan.gitbooks.io/betydb-documentation/content/distributed_betydb.html). (also note that for the context of breeding trials we are using the sites table to define plots as well as sites, and the cultivars table to define genotypes).

What really matters is that the yields table is never created, giving an error when you attempt to create grass yields.

That's a bug in the tutorial that I've reported here: https://github.com/terraref/tutorials/issues/12

Thanks!

dlebauer commented 7 years ago

@sckott the issue isn't with the traits package: @samsrabin was using the version 0.20 on CRAN rather than the most recent changes from @infotroph on GitHub so he ran into errors with our tutorials.

dlebauer commented 7 years ago

I've updated the documentation with instructions for using the VM https://github.com/PecanProject/betydb-data-access/commit/0ddd0541bd660f09eceff7343a156604530bac7f.

Thanks for your contributions @samsrabin