AnnieHow / dod

dod package development
https://anniehow.github.io/dod/
3 stars 0 forks source link

perhaps add to dod.ctd() #18

Closed dankelley closed 1 year ago

dankelley commented 2 years ago

@AnnieHow -- you might want to look into this...

I ran across https://www.ncei.noaa.gov/products/global-temperature-and-salinity-profile-programme and think it's worth checking into. If we can find a reasonable URL structure such that folks can look up data, then we ought to support this. But I've not clicked through enough links to see if they are set up that way. (Some websites don't have URLs ... the only way to get data is to go to a URL and then click a bunch of steps, but the URL displayed in the browser doesn't change, so we cannot access them using URL alone.)

PS. I found this link from https://meds-sdmm.dfo-mpo.gc.ca/isdm-gdsi/index-eng.html and at first I was thinking "let's use MEDS to find data until I realized that some of the data types require you to fill out forms to request access, which is useless for accessing within a program.

AnnieHow commented 2 years ago

For MEDS,

GTSPP

AnnieHow commented 2 years ago

Started this in Branch Issue 18 on Z with JH. Changes made so far:

As an example, our server takes us to https://www.ncei.noaa.gov/data/oceans/gtspp/bestcopy/indian/2020/01/. Our problem is that we can't find documentation for the structure of the file name to create an ID and successfully pull a file.

dankelley commented 2 years ago

Whenever diligent efforts to find documentation on file structures reveal nothing, we ought to state so.

I think it will come up a lot, and we may as well use uniform phrasing, which means that this should go into a file like man-roxygen/filestructure_guessed.R.

Perhaps it could contain something along the lines of the following. (We will want to make sure the writing is clear, because this paragraph will likely appear in almost all of the functions in dod.)

#' Lacking documentation on the form of the URL to be used to download
#' files, this function is based on reverse engineering, based on some
#' queries that worked at the time of writing. Whether these guesses will
#' be suitable more generally is unknown. The authors' intention is to rely
#' on user feedback, to discover changes in the server URL pattern.
AnnieHow commented 2 years ago

GTSPP working indod.ctd in the most up to date version of branch Issue18.

If you try this code, let me know if it works on your ends and I will push it up to the main branch! The ID argument is ocean basin initial (at, in or pa), year, month.

dod.ctd("GTSPP", ID= "at198501")
AnnieHow commented 2 years ago

GTSPP pushed to main in commit 127b19bdbc3c11b8afe26db37e715b7109dd6180.

j-harbin commented 1 year ago

Reopening this issue as I believe some more work needs to be done on it. Major koodos to @AnnieHow for the great documentation that told me exactly what I needed for an ID.

I say some more work needs to be done because of the following reprex, which tells me the proper url (ie. https://www.ncei.noaa.gov/data/oceans/gtspp/bestcopy/atlantic/1985/02/) is not being downloaded. I've assigned myself to this issue.

library(dod)
t <- dod.ctd.gtspp(ID="at200404", debug=1)
#> server: https://www.ncei.noaa.gov/data/oceans/gtspp/bestcopy/meds_ascii/
#> destdir: .
#> downloading "https://www.ncei.noaa.gov/data/oceans/gtspp/bestcopy/meds_ascii/" to "./at200404.gz"

Created on 2022-12-05 by the reprex package (v2.0.1)

dankelley commented 1 year ago

We ought to have such things in the test suite, so we'll know they work even if the codebase shifts. It's a bit tricky making a test suite be acceptable to CRAN, but I know how and J and I could talk about that over Z sometime if J wants.

j-harbin commented 1 year ago

This has been done in commit eab290e232759eb3f24b288b841ef2a8c686e99c of main.

The cases are as follows:

index = TRUE and read=FALSE Returns a gz file

index=TRUE and read = TRUE Returns a data frame with a list of NetCDF files that can be downloaded

index=FALSE and read = FALSE Returns a netcdf file of the specified index

index = FALSE and read = TRUE Returns a read in Netcdf file using ncdf4