datalad / datalad-remake

Other
2 stars 4 forks source link

For prospective execution make must be ran after initremote #66

Closed mslw closed 13 hours ago

mslw commented 4 days ago

So far so good

When I run the example from the README, everything works fine:

datalad create remake-test-1
cd remake-test-1

> mkdir -p .datalad/make/methods
> cat > .datalad/make/methods/one-to-many <<EOF
parameters = ['first', 'second', 'output']

command = [
    "bash",
    "-c",
    "echo content: {first} > '{output}-1.txt'; echo content: {second} > '{output}-2.txt'",
]
EOF
datalad save -m "add `one-to-many` remake method"

git annex initremote datalad-remake encryption=none type=external externaltype=datalad-remake allow-untrusted-execution=true
git config remote.datalad-remake.annex-security-allow-unverified-downloads ACKTHPPT

datalad make -p first=john -p second=susan -p output=person \
        -o person-1.txt -o person-2.txt --prospective-execution --allow-untrusted-execution one-to-many

The file is associated with the datalad-remake remote:

❱ git annex whereis person-1.txt
whereis person-1.txt (1 copy)
        59de66de-864e-453a-a0f3-aa53a29929c7 -- [datalad-remake]

  datalad-remake: datalad-remake:///?label=one-to-many&root_version=c7a0f60613d98ce35f3fcced91630e45e436ba9b&specification=c803717e90034d33adda7733dc47ff20&this=person-1.txt

and I can get it without issue:

❱ datalad get person-1.txt
get(ok): person-1.txt (file) [from datalad-remake...]

Here's where things can go wrong

However, when I swap the order of git annex initremote and make --prospective-execution (running the latter first):

datalad create remake-test-2
cd remake-test-2

> mkdir -p .datalad/make/methods
> cat > .datalad/make/methods/one-to-many <<EOF
parameters = ['first', 'second', 'output']

command = [
    "bash",
    "-c",
    "echo content: {first} > '{output}-1.txt'; echo content: {second} > '{output}-2.txt'",
]
EOF
datalad save -m "add `one-to-many` remake method"

datalad make -p first=john -p second=susan -p output=person \
        -o person-1.txt -o person-2.txt --prospective-execution --allow-untrusted-execution one-to-many

git annex initremote datalad-remake encryption=none type=external externaltype=datalad-remake allow-untrusted-execution=true
git config remote.datalad-remake.annex-security-allow-unverified-downloads ACKTHPPT

The file remains associated with the web remote:

❱ git annex whereis person-2.txt
whereis person-2.txt (1 copy)
        00000000-0000-0000-0000-000000000001 -- web

  web: datalad-remake:///?label=one-to-many&root_version=f937fddc7e438295b67d2c9ead6b7d400f7ea81e&specification=c803717e90034d33adda7733dc47ff20&this=person-2.txt
ok

Even though the remake remote is available:

❱ git annex info
trusted repositories: 0
semitrusted repositories: 4
        00000000-0000-0000-0000-000000000001 -- web
        00000000-0000-0000-0000-000000000002 -- bittorrent
        3a14d8e6-78f1-4e87-a542-03b3d35d35a8 -- mszczepanik@juseless:~/rmk/remake-test-2 [here]
        cbd5f32b-efd2-4cfd-aaa8-16784c0f548a -- [datalad-remake]
(snip)

Consequently, get fails. Note that the error is about annex.security.allowed-url-schemes (not unverified downloads, and not remake-trust-related), which I think suggests that the datalad-remake address is being handled by the web remote.

❱ datalad get person-1.txt
get(error): person-1.txt (file) [datalad-remake:///?label=one-to-many&root_version=f937fddc7e438295b67d2c9ead6b7d400f7ea81e&specification=c803717e90034d33adda7733dc47ff20&this=person-1.txt Configuration of annex.security.allowed-url-schemes does not allow accessing datalad-remake:///?label=one-to-many&root_version=f937fddc7e438295b67d2c9ead6b7d400f7ea81e&specification=c803717e90034d33adda7733dc47ff20&this=person-1.txt
downloading from all 1 known url(s) failed]

note: what I really meant by writing "things can go wrong" in the heading, I meant "this is how I messed up on my first attempt"

Looking back

I wonder if this is an expected behaviour? I thought the order should not matter and git-annex would "reassign" the special remote which claims the URL after remake gets enabled. So to me it looks like:

Might also be worth comparing to e.g. uncurl from datalad-next.

christian-monch commented 3 days ago

@mslw Thanks for catching that and the great writeup. I look into it. Did you check what happens if the allowed-url-schemes include datalad-remake?

mslw commented 3 days ago

@mslw Thanks for catching that and the great writeup. I look into it. Did you check what happens if the allowed-url-schemes include datalad-remake?

I just checked, it won't work - still goes through the web special remote, the error message changes to "download failed: Unsupported url scheme datalad-remake:///?label=one-to-many(...)"

The man page section for the option contains this: "Some special remotes support their own domain-specific URL schemes; those are not affected by this configuration setting"

mslw commented 3 days ago

Might also be worth comparing to e.g. uncurl from datalad-next.

So this seems to be the regular git-annex behavior, the remote claims the URL when it is added / registered. E.g.:

❱ git annex addurl --fast --file cat.jpg "https://unsplash.com/photos/a7bdqjeG6M4/download?force=true&w=640"                                            
addurl https://unsplash.com/photos/a7bdqjeG6M4/download?force=true&w=640 (to cat.jpg) ok

❱ git annex initremote uncurl type=external externaltype=uncurl encryption=none
initremote uncurl ok

❱ git annex addurl --fast --file bat.jpg "https://unsplash.com/photos/8vO-HsnSq4E/download?force=true&w=640"
addurl https://unsplash.com/photos/8vO-HsnSq4E/download?force=true&w=640 (from uncurl) (to bat.jpg) ok

❱ git annex whereis
whereis bat.jpg (1 copy)
        13f8e4f8-648f-48c2-bfd8-bc1c836a855c -- [uncurl]

  uncurl: https://unsplash.com/photos/8vO-HsnSq4E/download?force=true&w=640
ok
whereis cat.jpg (1 copy)
        00000000-0000-0000-0000-000000000001 -- web

  web: https://unsplash.com/photos/a7bdqjeG6M4/download?force=true&w=640
ok

Which means this was my error (I did not enable the remote in time). And it probably can be repaired with git annex rmurl followed by git annex registerurl (I would also expect git annex unregisterurl to also work, but it didn't do what I expected). But probably, given datalad make's reliance on the datalad-remake special remote, we could protect the command againist unattentive users such as me.

christian-monch commented 13 hours ago

Thanks a lot @mslw for the investigation. Will close this now.