Closed ptrebert closed 1 year ago
I created the new branch git_labels
in the repo template-snakemake
. I moved the function _collect_git_labels()
into the module pyutils.smk
and added another function _collect_git_remotes()
to collect the url of the remote in the priority order github
, git.hhu
, origin
(If no origin
is present an error message "This is not a git repository" will be printed into the "git-url" field. If we want to also extract the clear name (like "origin") of the remote I'll have to adjust the code but that information is user specific anyway.
I also included a test rule in snaketest.smk
to create a text file that contains the output of the _collect_git_labels
function.
This looks quite close to something that could be merged, but there a few remaining problems:
wd = WORKDIR
remotes = sp.check_output(["git remote -v"], shell=True, cwd=wd).decode().split()
subprocess as sp
breaks with the function _rsync
, and short hands at the top-level of a Snakefile (!) should be avoided to not clutter the namespacegit
executable is not available, which may be the case in an HPC environment. Maybe this should be checked first to decide whether or not to run the rest of the code. If git is not available, the pipeline can still run, so this should just be issued as a warning.constants
) the snakemake working directory, and not the directory of the checked out pipeline code (= the repo directory)_rsync
for reference or just use the plain stringLines 116-125 could be simplified by counting on the fact that the remote name is always the first item in the line after a .split()
.
The error message in line 127 is misleading, though, because it actually just means that the remote name is something quite unexpected (no rule against naming your remote, say, pizza).
What is also missing is adding this info (just like timestamp and username) to the config dump.
In repsonse to the points:
Initially, I had it the way you suggested but then realized that everybody can name the remote, say pizza and maybe use other locations other then git where the remote is stored and then it doesn't work properly or at all to just select the first item after a .split() The way I wrote it now is to not use the remote name but directly the url and look for "github" or "git.hhu" within the url and return the first occurrence and if that doesn't result in a match use the remote name (which by default should be "origin") and use that url. I agree that it looks messy with multiple "try-except" back-to-back but that was my best solution I could come up with.
I changed the text of the error message.
I also added the info into the config dump.
I think this can be marked as done, the template contains the git label functionality.
@svenwillger As just discussed, move this functionality to the snakemake template, and add a function to find the correct name of the remote if multiple remotes are configured (
github
takes priority overgithhu
). If only one remote exists, take that one of course.https://github.com/core-unit-bioinformatics/reference-container/blob/c1b9964c82cfc8dcdee3d999b1bc10f1cf2e850c/workflow/rules/functions.smk#L530