DataONEorg / rdataone

R package for reading and writing data at DataONE data repositories
http://doi.org/10.5063/F1M61H5X
36 stars 19 forks source link

How to resolve Object Name from identifier. #222

Closed LiamBurke24 closed 4 years ago

LiamBurke24 commented 6 years ago

Hi rdataone team!

Is there a way (or a function) in rdataone that can resolve the Object Name for a data object? I can't find any functions in your package that explicitly or implicitly pull the Object Name.

Here is an example:

id <- "doi:10.6073/pasta/63ad7159306bc031520f09b2faefcf87"
filepath = "~/datafile/"
CNode = "PROD"
lazyLoad = FALSE
quiet = F

 ###  retrieve mnId
  cn <- dataone::CNode(CNode) 
  locations <- dataone::resolve(cn, pid = id) 
  mnId <- locations$data[1,"nodeIdentifier"] 

  ### begin D1 download process
  d1c <- dataone::D1Client("PROD", mnId)
  pkg <- dataone::getDataPackage(d1c, id = id, lazyLoad = lazyLoad, quiet = quiet, limit = "1GB") 
  files <- datapack::getValue(pkg, name="sysmeta@formatId")
  n <- length(files) # number of files

  # make new directory within this directory
  newdir <- file.path(filepath, paste0("DataOne_", gsub("/", "-", id)))
  dir.create(newdir)

  #get the filename
  d1obj <- dataone::getDataObject(d1c, identifier = id)
  objname <- dataone::getIdentifier(d1obj)

This just returns the id (or doi) that I start with. I would like to be able to resolve the filename and extension that appear in the Object Name field. For example, see this https://search.dataone.org/#view/doi:10.6073/pasta/63ad7159306bc031520f09b2faefcf87 and scroll down to the Data Table.

I would greatly appreciate your help with this.

LiamBurke24 commented 6 years ago

Apologies. I accidentally closed the issue.

gothub commented 6 years ago

@LiamBurke24 The object name is contained in the metadata for a package, which for this package is the file with 'fileType' "EML v2.1.0". If the object name is what you need, then it would be necessary to download the EML file and parse it, in order to determine the 'objectName' value for each package member.

Note that for some packages, the filename is specified for each package member. This however, is a newer, optional feature, so is not widely available yet. If the filename is specified for an object, then it would be available either from the Solr index or from the system metadata for an object. To get this from the Solr index (if it is defined for a particular id):

cn <- CNode("PROD")
id <- "urn:uuid:6667c1ab-ddc0-4e5c-86b6-a36693179d18"
qTerm <- sprintf("id:\"%s\"", id)
queryParams <- list(q=qTerm, fl="id,fileName")
result <- query(cn, queryParams, as="data.frame", parse=FALSE)
result

To get the fileName from the System Metadata (if defined for an id):

sysmeta <- getSystemMetadata(cn, id)
sysmeta@fileName

It looks like members of the package that you mentioned in this issue do not have the fileName specified, so that would leave the option of parsing the metadata file.