Closed serbinsh closed 4 years ago
On first inspection, this looks like a bug as the dataset appears to be joined correctly. Will investigate.
@jrmerz ok thanks for the update
Actually, I might take that back. Looking at two different leaf spectra 2018-04-03_79
and 2018-03-21_65
2018-04-03_79
can be accessed at https://ecosis.org/api/spectra/search/5090905b-176c-4d17-bf60-59a69939eea6?text=&filters=%5B%7B%22Latin%20Species%22%3A%22%2F%5Esativus%24%2F%22%7D%5D&start=86&stop=87
and appears to have leaf traits attached.
2018-03-21_65
can be accessed at https://ecosis.org/api/spectra/search/5090905b-176c-4d17-bf60-59a69939eea6?text=&filters=%5B%7B%22Latin%20Species%22%3A%22%2F%5Esativus%24%2F%22%7D%5D&start=0&stop=1
and does not have leaf traits. However upon downloading leaf_traits.csv, 2018-03-21_65
does not exist in the traits spreadsheet, so there is nothing to join.
Please let me know if I am missing something here.
Oh boy ok let us look into this. Could be a data join issue, as you state.
@jrmerz After reviewing you comment, this is the expected behavior and matches some other datasets of ours. That is we upload spec and traits and there isnt always 1 to 1 matching; sometimes a trait doesnt have a spec or a spec doesnt have a trait to match with post QA/QC or due to other issues. However this generally doesnt cause us a problem we just get empty cells when using the data, which is the correct behavior. The larger issue is that we have 2 different associated "trait" datasets to connect with the spec, and only 1 is coming with download or via API. however if you view the data on the website then you can see the cases where they both link so its unclear why when downloading all the trait data isnt linked?
Can you provide me with an example uniquefield
field value that has the issue so I can inspect?
Here are some examples of different combinations. 2018-03-26_65
has a leaf_spectra and leaf_gas_exchange trait. 2018-03-29_17
has a leaf_spectra, and both leaf_traits and leaf_gas_exchange.
@jrmerz and thoughts or updates on this? Let us know if we should re-structure or modify to make the data better match expectations. We are working on getting the data published so we do expect needing a DOI and finalize version at some point in the future. Just not clear at the moment how we move forward
...we will also have another similar dataset to upload soon so if there is anything we can learn from this to make that one go more smoothly, that would be good to know. thanks!
Sorry, this slipped my plate. What's the title of your dataset? It looks like you removed it.
Nm, found it, just top link was wrong
Thats weird because i just looked again and the ID comes up as: 5090905b-176c-4d17-bf60-59a69939eea6. is that what worked for you as well? I think this matches the ID above? Oh maybe it was public and we pulled it back private and the older public link is broken?
Let's get on the same page, please provide the link to the dataset you wish me to look at and is described in the issue above.
@jrmerz Its the same ID I just noted
https://data.ecosis.org/dataset/hyperspectral-leaf-reflectance--biochemistry--and-physiology-of-droughted-and-watered-crops
https://data.ecosis.org/import/?id=5090905b-176c-4d17-bf60-59a69939eea6
ID: 5090905b-176c-4d17-bf60-59a69939eea6
Does that help to clarify?
Thanks @serbinsh it's a bug with EcoSIS. Has to do with the mapreduce query when a dataset is pushed. Give me a day or two to test and verify fix. I'll let you know when things are good to go.
On your end, once the fix is added to production, you will just need to re-publish the dataset.
Awesome, thank you @jrmerz! A few days is no problem. If you need longer let me know as the manuscript is still under review so at the moment this isnt super urgent
@serbinsh I have pushed a fix to the dev server. Would you mind giving it a test with your dataset and making sure everything looks correct? Afterward you are free to delete the dataset from the dev server
https://dev-data.ecosis.org/ Test should show up here after push: https://dev-search.ecosis.org/
@jrmerz Working on testing this out. One issue, I cant seem to remember my password for the dev site, and the recovery email isnt coming through. No matter but when I created another user I didnt see a way to add myself to an organization or create one. Thus i cant upload and test at the moment....
@serbinsh did you check your SPAM folder? That is where my password recovery emails always end up from ecosis. Other alternatives are 1) I manually add you to org or 2) I generate a temp password for you and send to you offline. Let me know which you prefer
@jrmerz Yeah I have scoured all of my spam folders; wasnt sure if my original serbinsh username was under serbinsh@gmail.com serbin@wisc.edu or sserbin@bnl.gov. Could you please generate a temp password for serbinsh (which is part of an org) and send my way, if not too much trouble?
Thanks!
ok, if share it with serbinsh@gmail.com, will that work?
OK so far it looks like its working:
https://dev-search.ecosis.org/package/09479e5a-22f8-4924-ad20-1562cc900459
For example if you scroll to observation "Spectra 586 of 2462" you will see all the trait data including gasex listed.
Next let me try pulling via API to see if all the data comes into R properly
This is promising
**> message("Download complete!")
Download complete!
> names(dat_raw)[1:40]
[1] "ABA" "Amino_Acids" "Asat" "CO2s" "Ci" "Days_Into_Treatment"
[7] "Elemental_C" "Elemental_N" "Fructose" "Glucose" "H2Os" "HDP_Fructan"
[13] "Instrument" "LDP_Fructan" "LMA" "Location" "Measurement_Date" "Paired_Spectra"
[19] "Plant" "Plant_Age" "Plot" "Pre_or_Post_Treatment" "Proline" "Protein"
[25] "Qin" "RHs" "RWC" "Rep" "Species" "Starch"
[31] "Sucrose" "Tleaf" "Tr" "Treatment" "VPDleaf" "flow"
[37] "gs" "uniquefield" "350" "351"
Also confirmed that all 2462 obs are in the R object via API
@regnans Looks like its working for me
here is how I tested the API
#---------------- Close all devices and delete all variables. -------------------------------------#
rm(list=ls(all=TRUE)) # clear workspace
graphics.off() # close any open graphics
closeAllConnections() # close any open connections to files
list.of.packages <- c("readr","httr","dplyr","ggplot2") # packages needed for script
# check for dependencies and install if needed
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
# load libraries needed for script
library(readr) # readr - read_csv function to pull data from EcoSIS
library(dplyr)
library(reshape2)
library(ggplot2)
# define function to grab PLSR model from GitHub
#devtools::source_gist("gist.github.com/christophergandrud/4466237")
source_GitHubData <-function(url, sep = ",", header = TRUE) {
require(httr)
request <- GET(url)
stop_for_status(request)
handle <- textConnection(content(request, as = 'text'))
on.exit(close(handle))
read.table(handle, sep = sep, header = header)
}
# not in
`%notin%` <- Negate(`%in%`)
#--------------------------------------------------------------------------------------------------#
#--------------------------------------------------------------------------------------------------#
### Set working directory (scratch space)
outdir <- tempdir()
setwd(outdir) # set working directory
getwd() # check wd
print(getwd())
#--------------------------------------------------------------------------------------------------#
#--------------------------------------------------------------------------------------------------#
### Grab data
print("**** Downloading Ecosis data ****")
ecosis_id <- "09479e5a-22f8-4924-ad20-1562cc900459" # NGEE-Arctic dataset
ecosis_file <- sprintf(
"https://dev-search.ecosis.org/api/package/%s/export?metadata=true",
ecosis_id
)
message("Downloading data...")
dat_raw <- read_csv(ecosis_file)
message("Download complete!")
names(dat_raw)[1:40]
head(dat_raw)
#--------------------------------------------------------------------------------------------------#
#--------------------------------------------------------------------------------------------------#
### Prepare data
Start.wave <- 500
End.wave <- 2400
wv <- seq(Start.wave,End.wave,1)
spectra <- data.frame(dat_raw[,names(dat_raw) %in% wv])
names(spectra) <- c(paste0("Wave_",wv))
head(spectra)[,1:5]
sample_info <- dat_raw[,names(dat_raw) %notin% seq(350,2500,1)]
head(sample_info)
names(sample_info)
ggplot(sample_info, aes(x=Asat)) + geom_histogram()
ggplot(sample_info, aes(x=Starch)) + geom_histogram()
ggplot(sample_info, aes(x=RWC)) + geom_histogram()
#--------------------------------------------------------------------------------------------------#
Shawn, this all looks great. I have deployed the fix to production. Please test and you or I can close issue if everything looks good on your end.
@jrmerz @regnans OK uploaded the dataset again to the main ecosis site and it seems to be parsing correctly: https://ecosis.org/package/5090905b-176c-4d17-bf60-59a69939eea6
Tested the API and we look to be good now, thanks!
Great! Closing issue
I may have had this thread before but I am having an issue with a new dataset on EcoSIS. We are merging both leaf and canopy spectra files together with a metadata file and two separate trait/gasex files. We are loading the spec files as data (green) and the main metadata file as well as the leaf trait and gasex files as metadata (blue). All files have the same unique ID to link all of the data together. In the viewer on the web page when you flick through the spec you can find all of the data linked together. However, when i pull the data via the API in R or download the data "linked" only the leaf trait data is connected and the gasex observations are ignored? How do we load the data to make sure all associated data/metadata are connected together?
The new record in question is: https://ecosis.org/package/5090905b-176c-4d17-bf60-59a69939eea6