Closed richardsc closed 2 years ago
We don't want to break existing code, and we don't want to get into guessing games. Therefore, we cannot code this by checking whether file
is a string naming a local file. Just as an example, the default value is file="core"
, and a user might have a file of that name that does anything, maybe something to do with the geophysics of the deep earth :-)
I think the best would be to add a new arg called "localFile" or something. If that is set, then we read that file, and ignore the args "file", "server" and "age arguments. This would be pretty easy to code, and document. The former means we aren't likely to break existing functionality, and the latter means we won't confuse users.
Any thoughts on this, @j-harbin? I can code it tomorrow morning, or you could take a stab (in a branch named for this issue).
Having turned this offer in my mind, I think the solution is to have a new logical parameter, local
. I always like to start by writing docs, and so I hope to get comments on the following.
1. Docs for new arg. I propose to add
#' @param local a logical value that indicates whether `file` is the name of a local file that
#' holds a local file, as opposed to an indication of a non-local file to be downloaded from a `server`.
2. Change of docs for filename
arg. I propose to change
#' @param filename character value that indicates the file name on the server, as in...
to something like
#' @param filename character value that indicates either the name of an index file. If `local`
#' is TRUE, then this is the name of a local file, as might have been downloaded previousl
#' from an argo server. Otherwise, `filename` indicates the file name on the server, as in...
I am going to code this today
Oh my. I've been working on this for a few hours and am near the end but now I see the problem with the idea. I discovered before that the tar.gz
file does not contain sufficient information to reliably get profiles. I don't have full notes in front of me, but here are the comments I see in the code for getIndex()
:
## The Construction of the remote filename is tricky was changed on 2020-04-30. Previously,
## we used "ftpRoot", inferred from the "# FTP" line in the header in the index file.
## However, as discussed at https://github.com/ArgoCanada/argoFloats/issues/82,
## this decision was faulty, in the sense that it sometimes worked and sometimes
## failed, for the *same* index object -- that is, the file structure on the server
## must be changeable. So, as a test (which will need to be reevaluated over time),
## we switched to using "server" instead of "ftpRoot". That means that we seek
## the NetCDF files from the same source as the index, and *not* from the source listed
## in the "# FTP" line in the header within that index.
So, adding this local
feature is not straightforward.
Anyway, it is trivial to use standard R methods to save the results of a call to getIndex()
. Just do something like
library(argoFloats)
ai <- getIndex()
save(ai, file="argoIndex_20220625.rda")
and later use
load("argoIndex_20220625.rda")
# now use `ai`...
instead of
ai <- getIndex()
# now use `ai` ...
I am pushing what I have to issue577
, but note that it is not working, and my changes will be abandoned.
@j-harbin you might want to read through this next week. I will close this issue at noon on Monday.
I think this seems sensible. When I go over to issue577
branch, I have a few suggestions.
library(argoFloats)
a <- getIndex(filename="/Users/jaimiekeeping/data/argo/D4901828_034.nc", local=TRUE)
#> Warning in readLines(destfileTemp, 100): line 1 appears to contain an embedded
#> nul
#> Warning in readLines(destfileTemp, 100): line 2 appears to contain an embedded
#> nul
#> Warning in readLines(destfileTemp, 100): line 3 appears to contain an embedded
#> nul
#> Warning in readLines(destfileTemp, 100): line 4 appears to contain an embedded
#> nul
#> Warning in readLines(destfileTemp, 100): line 5 appears to contain an embedded
#> nul
#> Error in strsplit(first[1 + lastHash], ",")[[1]]: subscript out of bounds
Created on 2022-06-27 by the reprex package (v2.0.1)
getIndex()
, where the local
argument is described, it says "a logical value, FALSE by default, indicating whether file is the name of a local file that was previously downloaded from a server.", it should say filename
and not file
local=TRUE
, I received the following error. I recommend mentioning something such as "or set local=TRUE if you are trying to read a local file".
library(argoFloats)
a <- getIndex(filename="/Users/jaimiekeeping/data/argo/D4901828_034.nc")
#> Error in getIndex(filename = "/Users/jaimiekeeping/data/argo/D4901828_034.nc"): filename="/Users/jaimiekeeping/data/argo/D4901828_034.nc" doesn't exist. Try one of these: "core", "bgc", "bgcargo", "traj", "bio-traj", "synthetic","ar_index_global_prof.txt.gz", "argo_bio-profile_index.txt.gz", "ar_index_global_traj.txt.gz", "argo_bio-traj_index.txt.gz", or "argo_synthetic-profile_index.txt.gz".
Created on 2022-06-27 by the reprex package (v2.0.1)
Created on 2022-06-27 by the reprex package (v2.0.1)
getIndex
would be an argos
type and not a index
type, if so I think the docs need to be updated for this (I could be misunderstanding this).I'm happy to Z about this if you think that's useful.
Is 0800h ok for a Z?
You bet!
Not going to be able to join your zoom (you folks are too early for me!), but at the very least Jaimie I see that you are trying to read a profile netcdf not the index CSV file that was proposed here.
(IIRC readProfiles()
should be able to read a bunch of already-downloaded netcdf files (which is the point), but there is also read.argo()
in oce.)
When I try branch issue577
, after downloading an index file, I get:
> library(argoFloats)
> i <- getIndex('~/Downloads/ar_index_global_prof.txt.gz', local=TRUE)
Error in save(argoFloatsIndex, file = destfileRda) :
object 'destfileRda' not found
Here's the upshot from the meeting that @j-harbin and I discussed ...
I think this is all moot, because, as I explained in https://github.com/ArgoCanada/argoFloats/issues/577#issuecomment-1166258368, users will need to keep a record of the server from which they got the index. That information is required for getProfiles()
to recover files.
Long story short, a tar.gz file from the server is not sufficient to later download profile files, simply because those .tar.gz files do not state the server. (Different servers hold different indices, and different profile files.
We looked into this in detail a year ago, and I think the comment to which I refer above has some pointers.
I propose to simply drop this planned change. It won't work for users. And the alternative is trivial: just do
i <- getIndex()
save(i, file="my_nice_index.rda")
and later on, just do
load("my_nice_index.rda")
and then proceed to use i
as though it had come from a getIndex()
call done today.
@j-harbin, please add comments if you wish to do so.
@richardsc, please consider closing this.
I think I understand now. Sorry for the confusion! (and wasted time). Too bad the indices aren't more consistently formatted 😞
See #576 . It seems reasonable that
getIndex()
should be able to read a locally downloaded index file (and create the correct .rda in theargoDefaultDestDir()
accordingly).An additional advantage to this is that it makes for a completely reproducible workflow, because the index won't update in the middle of an analysis.