Setting the CRAN mirror via ~/.Rprofile does not seem to work

Anirban166 commented 7 months ago

Tried echo "options(repos = c(CRAN = 'https://cloud.r-project.org'))" >> ~/.Rprofile at first, but it failed to detect the CRAN mirror.

Found out that the dotfile was located elsewhere:

print(normalizePath('~/.Rprofile'))
/github/home/.Rprofile

Corrected the path to it and printed out the contents after inserting to check:

echo "options(repos = c(CRAN = 'https://cloud.r-project.org'))" >> /github/home/.Rprofile
cat /github/home/.Rprofile

Sys.setenv("PKGCACHE_HTTP_VERSION" = "2")
options(
  repos = c(
    RSPM = 'https://packagemanager.posit.co/cran/__linux__/focal/latest',
    CRAN = 'https://cran.rstudio.com'
  ),
  Ncpus = 1,
  HTTPUserAgent = sprintf("R/%s R (%s) on GitHub Actions", getRversion(), paste(getRversion(), R.version$platform, R.version$arch, R.version$os))
)
options(repos = c(CRAN = 'https://cloud.r-project.org'))

Still fails:

Error in contrib.url(repos, type) : 
  trying to use CRAN without setting a mirror
Calls: install.packages -> startsWith -> contrib.url
Execution halted
Error: Process completed with exit code 1.

tdhock commented 7 months ago

it could be that R is being run with R --vanilla, is that it? (in that case .Rprofile is not read)

Anirban166 commented 7 months ago

it could be that R is being run with R --vanilla, is that it? (in that case .Rprofile is not read)

Just checked, it doesn't seem to be the case:

Rscript -e "cat('R running with --vanilla:', '--vanilla' %in% commandArgs(trailingOnly = FALSE), '\n')"

R running with --vanilla: FALSE

The only flags in the argument list are --no-echo and --no-restore.

tdhock commented 7 months ago

full docs are in ?.Rprofile

Initialization at Start of an R Session

Description:

     In R, the startup mechanism is as follows.

     Unless '--no-environ' was given on the command line, R searches
     for site and user files to process for setting environment
     variables.  The name of the site file is the one pointed to by the
     environment variable 'R_ENVIRON'; if this is unset,
     '<R_HOME>/etc/Renviron.site' is used (if it exists, which it does
     not in a 'factory-fresh' installation).  The name of the user file
     can be specified by the 'R_ENVIRON_USER' environment variable; if
     this is unset, the files searched for are '.Renviron' in the
     current or in the user's home directory (in that order).  See
     'Details' for how the files are read.

     Then R searches for the site-wide startup profile file of R code
     unless the command line option '--no-site-file' was given.  The
     path of this file is taken from the value of the 'R_PROFILE'
     environment variable (after tilde expansion).  If this variable is
     unset, the default is '<R_HOME>/etc/Rprofile.site', which is used
     if it exists (it contains settings from the installer in a
     'factory-fresh' installation).  This code is sourced into the
     workspace (global environment).  Users need to be careful not to
     unintentionally create objects in the workspace, and it is
     normally advisable to use 'local' if code needs to be executed:
     see the examples.  '.Library.site' may be assigned to and the
     assignment will effectively modify the value of the variable in
     the base namespace where '.libPaths()' finds it.  One may also
     assign to '.First' and '.Last', but assigning to other variables
     in the execution environment is not recommended and does not work
     in some older versions of R.

     Then, unless '--no-init-file' was given, R searches for a user
     profile, a file of R code.  The path of this file can be specified
     by the 'R_PROFILE_USER' environment variable (and tilde expansion
     will be performed).  If this is unset, a file called '.Rprofile'
     is searched for in the current directory or in the user's home
     directory (in that order).  The user profile file is sourced into
     the workspace.

     Note that when the site and user profile files are sourced only
     the 'base' package is loaded, so objects in other packages need to
     be referred to by e.g. 'utils::dump.frames' or after explicitly
     loading the package concerned.

Anirban166 commented 7 months ago

Then, unless '--no-init-file' was given, R searches for a user profile, a file of R code. The path of this file can be specified by the 'R_PROFILE_USER' environment variable (and tilde expansion will be performed). If this is unset, a file called '.Rprofile' is searched for in the current directory or in the user's home directory (in that order). The user profile file is sourced into the workspace.

^This was helpful! Since my .Rprofile wasn't being used when at /github/home/(unless I explicitly run that step with env having HOME: /github/home), I copied it to the current directory following that description of it being searched for in the current directory. I tested its functioning by setting an environment variable inside the copied .Rprofile and then tried to access it:

echo $R_PROFILE_USER
cp /github/home/.Rprofile .
echo "Sys.setenv(R_PROFILE_LOADED = 'TRUE')" >> .Rprofile
echo "options(repos = c(CRAN = 'https://cloud.r-project.org/'))" >> .Rprofile
Rscript -e 'Sys.getenv("R_PROFILE_LOADED")'

[1] "TRUE"

Looks like it is working now so I made the transition in my workflow to avoid the repetition of setting the CRAN mirror.

tdhock commented 7 months ago

great now can you please remove the git switch? (or document why it is not possible)

Anirban166 commented 7 months ago

great now can you please remove the git switch? (or document why it is not possible)

Did so and as you can see all the branches (merge-base, base, master) aren't detected by atime without it: (like I mentioned to you before)

As opposed to having them:

I wrote a condensed comment before I deleted it, that it's required for 'checks between branches' but I'll write a more detailed one.

You were asking before why the checkout step wasn't sufficient alone, and that's because checkout by default does not create local branch references for each branch in the repository. To elaborate, this means that the local environment might not have branch names available for use in commands that expect them, like some git operations that you are using under the hood with atime.

More specifically git2r::revparse_single, which is used to find a specific revision (such as a commit, branch, or tag). For this function to work correctly, especially in the context of analyzing pull requests like our case, the local repository needs to have references to the branches. This is where the git switch commands come into play - they are used to ensure these local branch references exist.

As you can see now with the first plot, the workflow did not switch to the branches explicitly andgit2r was not able to properly retrieve the branch references it needed when I deleted that step.

tdhock commented 7 months ago

ok thanks for the explanation. that is helpful to see the figure showing what it looks like when it is omitted.

however your explanation seems to contradict https://docs.github.com/en/actions/learn-github-actions/variables#default-environment-variables which says that GITHUB_BASE_REF is a default environment variable that is available at every step of the workflow, "The name of the base ref or target branch of the pull request..." which implies that the "base=" line on the plot should be available (even without git switch). But in your plot it only has has HEAD= (not base=). I don't understand why, do you?

can you please add a small comment with a simpler version of this explanation in the action yaml file with a link to this comment?

I do not see any explanation about why git switch is needed twice, can you please explain? (it seems to me like once should be sufficient?)

Anirban166 commented 7 months ago

however your explanation seems to contradict https://docs.github.com/en/actions/learn-github-actions/variables#default-environment-variables which says that GITHUB_BASE_REF is a default environment variable that is available at every step of the workflow, "The name of the base ref or target branch of the pull request..." which implies that the "base=" line on the plot should be available (even without git switch). But in your plot it only has has HEAD= (not base=). I don't understand why, do you?

You're right that GITHUB_BASE_REF (and also GITHUB_HEAD_REF) are available as environment variables in GHA workflows, and I too reckon that these variables should provide the names of both the base and head branches associated with the pull request, so nope, I do not understand either why the base= label does not pop up in that plot :(

But either way, having just these names available as environment variables does not automatically mean that local branch references are created in the git repository checked out by the GitHub Actions runner.

can you please add a small comment with a simpler version of this explanation in the action yaml file with a link to this comment?

Sure, and I actually added that before you commented - can you please check if this is good enough? (and yes I'll add the link once I hear back on this)

I do not see any explanation about why git switch is needed twice, can you please explain? (it seems to me like once should be sufficient?)

It's not the same thing twice, it's for two different refs: (base and head)

git switch "${GITHUB_BASE_REF}"
git switch "${GITHUB_HEAD_REF}"

The first git switch is used to check out the base branch, and the second switch checks out the head branch. This is to ensure that both branches are checked out at least once during my workflow to create the local references needed for BASE and HEAD (in other words, switching to each once ensures these references are available locally since we require explicit references to both branches).

And if I were to not use git switch and instead git checkout (since we discussed the difference between them much earlier, or as to why not use git checkout instead), an equivalent but more verbose way to do that would be:

...

    steps:
    - name: Checkout
      uses: actions/checkout@v4
      with:
        fetch-depth: 0

    - name: Setup local branch for BASE_REF
      run: |
        # Using git fetch here since checkout does not automatically attempt to fetch the branch from the remote if it doesn’t exist locally: (unlike git switch) 
        git fetch origin ${GITHUB_BASE_REF}:${GITHUB_BASE_REF}
        git checkout ${GITHUB_BASE_REF}

    - name: Setup local branch for HEAD_REF
      run: |
        git fetch origin ${GITHUB_HEAD_REF}:${GITHUB_HEAD_REF}
        git checkout ${GITHUB_HEAD_REF}

...

(Again, note that actions/checkout only clones the repository and checks out the commit that triggered the workflow, so while it does set up a working copy of the project, it does not create local branch references for the base and head branches of the pull request by default)

tdhock commented 7 months ago

Thank you for the very clear explanation. Excellent work!

Anirban166 commented 7 months ago

Thank you for the very clear explanation. Excellent work!

Happy to hear!

Anirban166 / Autocomment-atime-results

Setting the CRAN mirror via ~/.Rprofile does not seem to work #33