Closed Anirban166 closed 7 months ago
it could be that R is being run with R --vanilla, is that it? (in that case .Rprofile is not read)
it could be that R is being run with R --vanilla, is that it? (in that case .Rprofile is not read)
Just checked, it doesn't seem to be the case:
Rscript -e "cat('R running with --vanilla:', '--vanilla' %in% commandArgs(trailingOnly = FALSE), '\n')"
R running with --vanilla: FALSE
The only flags in the argument list are --no-echo
and --no-restore
.
full docs are in ?.Rprofile
Initialization at Start of an R Session
Description:
In R, the startup mechanism is as follows.
Unless '--no-environ' was given on the command line, R searches
for site and user files to process for setting environment
variables. The name of the site file is the one pointed to by the
environment variable 'R_ENVIRON'; if this is unset,
'<R_HOME>/etc/Renviron.site' is used (if it exists, which it does
not in a 'factory-fresh' installation). The name of the user file
can be specified by the 'R_ENVIRON_USER' environment variable; if
this is unset, the files searched for are '.Renviron' in the
current or in the user's home directory (in that order). See
'Details' for how the files are read.
Then R searches for the site-wide startup profile file of R code
unless the command line option '--no-site-file' was given. The
path of this file is taken from the value of the 'R_PROFILE'
environment variable (after tilde expansion). If this variable is
unset, the default is '<R_HOME>/etc/Rprofile.site', which is used
if it exists (it contains settings from the installer in a
'factory-fresh' installation). This code is sourced into the
workspace (global environment). Users need to be careful not to
unintentionally create objects in the workspace, and it is
normally advisable to use 'local' if code needs to be executed:
see the examples. '.Library.site' may be assigned to and the
assignment will effectively modify the value of the variable in
the base namespace where '.libPaths()' finds it. One may also
assign to '.First' and '.Last', but assigning to other variables
in the execution environment is not recommended and does not work
in some older versions of R.
Then, unless '--no-init-file' was given, R searches for a user
profile, a file of R code. The path of this file can be specified
by the 'R_PROFILE_USER' environment variable (and tilde expansion
will be performed). If this is unset, a file called '.Rprofile'
is searched for in the current directory or in the user's home
directory (in that order). The user profile file is sourced into
the workspace.
Note that when the site and user profile files are sourced only
the 'base' package is loaded, so objects in other packages need to
be referred to by e.g. 'utils::dump.frames' or after explicitly
loading the package concerned.
Then, unless '--no-init-file' was given, R searches for a user profile, a file of R code. The path of this file can be specified by the 'R_PROFILE_USER' environment variable (and tilde expansion will be performed). If this is unset, a file called '.Rprofile' is searched for in the current directory or in the user's home directory (in that order). The user profile file is sourced into the workspace.
^This was helpful!
Since my .Rprofile wasn't being used when at /github/home/
(unless I explicitly run that step with env
having HOME: /github/home
), I copied it to the current directory following that description of it being searched for in the current directory.
I tested its functioning by setting an environment variable inside the copied .Rprofile
and then tried to access it:
echo $R_PROFILE_USER
cp /github/home/.Rprofile .
echo "Sys.setenv(R_PROFILE_LOADED = 'TRUE')" >> .Rprofile
echo "options(repos = c(CRAN = 'https://cloud.r-project.org/'))" >> .Rprofile
Rscript -e 'Sys.getenv("R_PROFILE_LOADED")'
[1] "TRUE"
Looks like it is working now so I made the transition in my workflow to avoid the repetition of setting the CRAN mirror.
great now can you please remove the git switch? (or document why it is not possible)
great now can you please remove the git switch? (or document why it is not possible)
Did so and as you can see all the branches (merge-base, base, master) aren't detected by atime
without it: (like I mentioned to you before)
As opposed to having them:
I wrote a condensed comment before I deleted it, that it's required for 'checks between branches' but I'll write a more detailed one.
You were asking before why the checkout step wasn't sufficient alone, and that's because checkout
by default does not create local branch references for each branch in the repository. To elaborate, this means that the local environment might not have branch names available for use in commands that expect them, like some git
operations that you are using under the hood with atime
.
More specifically git2r::revparse_single
, which is used to find a specific revision (such as a commit, branch, or tag). For this function to work correctly, especially in the context of analyzing pull requests like our case, the local repository needs to have references to the branches. This is where the git switch
commands come into play - they are used to ensure these local branch references exist.
As you can see now with the first plot, the workflow did not switch to the branches explicitly andgit2r
was not able to properly retrieve the branch references it needed when I deleted that step.
ok thanks for the explanation. that is helpful to see the figure showing what it looks like when it is omitted.
however your explanation seems to contradict https://docs.github.com/en/actions/learn-github-actions/variables#default-environment-variables which says that GITHUB_BASE_REF
is a default environment variable that is available at every step of the workflow, "The name of the base ref or target branch of the pull request..." which implies that the "base=" line on the plot should be available (even without git switch). But in your plot it only has has HEAD= (not base=). I don't understand why, do you?
can you please add a small comment with a simpler version of this explanation in the action yaml file with a link to this comment?
I do not see any explanation about why git switch is needed twice, can you please explain? (it seems to me like once should be sufficient?)
however your explanation seems to contradict https://docs.github.com/en/actions/learn-github-actions/variables#default-environment-variables which says that GITHUB_BASE_REF is a default environment variable that is available at every step of the workflow, "The name of the base ref or target branch of the pull request..." which implies that the "base=" line on the plot should be available (even without git switch). But in your plot it only has has HEAD= (not base=). I don't understand why, do you?
You're right that GITHUB_BASE_REF
(and also GITHUB_HEAD_REF
) are available as environment variables in GHA workflows, and I too reckon that these variables should provide the names of both the base and head branches associated with the pull request, so nope, I do not understand either why the base=
label does not pop up in that plot :(
But either way, having just these names available as environment variables does not automatically mean that local branch references are created in the git repository checked out by the GitHub Actions runner.
can you please add a small comment with a simpler version of this explanation in the action yaml file with a link to this comment?
Sure, and I actually added that before you commented - can you please check if this is good enough? (and yes I'll add the link once I hear back on this)
I do not see any explanation about why git switch is needed twice, can you please explain? (it seems to me like once should be sufficient?)
It's not the same thing twice, it's for two different refs: (base and head)
git switch "${GITHUB_BASE_REF}"
git switch "${GITHUB_HEAD_REF}"
The first git switch is used to check out the base branch, and the second switch checks out the head branch. This is to ensure that both branches are checked out at least once during my workflow to create the local references needed for BASE and HEAD (in other words, switching to each once ensures these references are available locally since we require explicit references to both branches).
And if I were to not use git switch
and instead git checkout
(since we discussed the difference between them much earlier, or as to why not use git checkout
instead), an equivalent but more verbose way to do that would be:
...
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Setup local branch for BASE_REF
run: |
# Using git fetch here since checkout does not automatically attempt to fetch the branch from the remote if it doesn’t exist locally: (unlike git switch)
git fetch origin ${GITHUB_BASE_REF}:${GITHUB_BASE_REF}
git checkout ${GITHUB_BASE_REF}
- name: Setup local branch for HEAD_REF
run: |
git fetch origin ${GITHUB_HEAD_REF}:${GITHUB_HEAD_REF}
git checkout ${GITHUB_HEAD_REF}
...
(Again, note that actions/checkout
only clones the repository and checks out the commit that triggered the workflow, so while it does set up a working copy of the project, it does not create local branch references for the base and head branches of the pull request by default)
Thank you for the very clear explanation. Excellent work!
Thank you for the very clear explanation. Excellent work!
Happy to hear!
Tried
echo "options(repos = c(CRAN = 'https://cloud.r-project.org'))" >> ~/.Rprofile
at first, but it failed to detect the CRAN mirror.Found out that the dotfile was located elsewhere:
Corrected the path to it and printed out the contents after inserting to check:
Still fails: