SATVILab / projr

Streamline R projects
https://satvilab.github.io/projr/
Other
1 stars 1 forks source link

Consider best strategies for handling versioning of "dumb" remotes #515

Open MiguelRodo opened 5 months ago

MiguelRodo commented 5 months ago

So, let's say that we have a local remote (just a folder), and we want to determine whether we should copy anything to there.

So, we have in our local manifest a record of what was done in the latest version (which we know is correct).

Then, suppose that, for that remote, we just copied it across what was in that remote.

Okay, so, what IS the remote? Is it what's at the path? Yes, right? That's basically the ID. Or is the path the sub-part of the remote? Or does it include the label? Maybe let's look at what the old _proj.yml had, before removing the local archiving.

Okay, this is what that looked like:

build:
  local:
    "archive":
      path: _archive
      content: [data-raw, output, docs]
      structure: version

Okay, yes, so the path was _archive. Then, there was a label, and then there was the version.

So, where should the manifest be kept? Well, clearly, it should be kept at the top directory. Then, we're basically assuming that each directory only has the data-raw thing kept once (maybe an issue for OSF, when we might put archive in there as well? Hmmm, I don't know... I guess we should specify what the path is to it as well. So, for each, we should keep the manifest at the top level, but then specify within it:

Okay, so if there is no manifest.csv file there or it does not have that, then:

I think, at this stage, we don't keep a record of what was previously at that remote. It's just not relevant. We only keep the active folders, so for latest that's just whatever's at the path. For versioned, it's for whatever's been uploaded. So, it's not a record but a live value.

Then, this impacts how we compare what's there. Once we have that, the upload proceeds as normal.

So, what needs to happen?

MiguelRodo commented 5 months ago

So, the plan (exactly what to send) is only determined for each label within a title - thank goodness! That means we already do the planning as often as we need. We've also got all the info we need to get te remote (and not just remote_final). This is very good news.

So, this is what happens in projr_dest_send_label:

So, I need to add:

Now, within .projr_dest_send_get_plan_detail, we have, for example, this function:

.projr_dest_send_get_plan_detail_add_missing <- function(path_dir_local,
                                                         remote,
                                                         type) {
  path_dir_local_remote <- .dir_create_tmp_random()
  fn_vec_remote <- .projr_remote_file_ls(type, remote)
  fn_vec_local <- .file_ls(path_dir_local)
  fn_vec_add <- setdiff(fn_vec_local, fn_vec_remote)
  .dir_rm(path_dir_local_remote)
  list("add" = fn_vec_add, "rm" = character())
}

Let's rather look where we do use version-source:

.projr_dest_send_get_plan_detail_change <- function(remote,
                                                    type,
                                                    label,
                                                    version_source,
                                                    path_dir_local) {
  change_list <- .projr_change_get(
    label = label,
    path_dir_local = path_dir_local,
    version_source = version_source,
    type = type,
    remote = remote
  )
  list(
    "add" = c(
      change_list[["kept_changed"]][["fn"]] %@@% character(),
      change_list[["added"]][["fn"]] %@@% character()
    ) |>
      as.character(),
    "rm" = change_list[["removed"]][["fn"]] %@@% character() |> as.character()
  )
}

At the moment, we're only passing label to it, because it's just comparing the latest two versions.

Here are the contents:


.projr_change_get_manifest <- function(version_post = NULL,
                                       version_pre = NULL,
                                       label = NULL) {
  # this differs from .projr_change_get_hash
  # as it will filter on version and does
  # not assume there is only one label
  # get manifests from previous version and current version
  manifest <- .projr_manifest_read(.dir_proj_get("manifest.csv"))

  if (nrow(manifest) == 0L) {
    return(.projr_zero_list_manifest_get())
  }

  # get version to compare
  version_vec <- .projr_change_get_manifest_version_to_compare(
    version_post = version_post,
    version_pre = version_pre,
    manifest = manifest
  )

  # choose current label only,
  # done after comparing to ensure we get the right comparison
  if (!is.null(label)) {
    manifest <- manifest[manifest[["label"]] == label, ]
  }

  # use zero table if version_pre not found
  manifest_pre <- manifest[manifest[["version"]] == version_vec[["pre"]], ] %@@%
    .projr_zero_tbl_get_manifest()

  manifest_post <- manifest[manifest[["version"]] == version_vec[["post"]], ]

  # compare
  # -----------------

  # can't assume there's only one label
  .projr_change_get_hash(hash_pre = manifest_pre, hash_post = manifest_post)
}

From the version_vec, we'd only need the latest version locally. Then, we'd need to get that manifest off the remote (maybe we should download that earlier?) and just get the latest version on there.

And then we compare, as before.

Simple!