flokli / nix-casync

A more efficient way to store and substitute Nix store paths
80 stars 4 forks source link

self-reference handling #30

Closed ajs124 closed 2 years ago

ajs124 commented 2 years ago

I've run into an issue with References being empty for storepaths that I nix copyd into nix-casync.

An example can be found below. As served by eris (signature removed, for reasons):

StorePath: /nix/store/z8d9wkiyad59ikv0jwck3v2k80plfbk5-ruby-2.7.5
URL: nar/z8d9wkiyad59ikv0jwck3v2k80plfbk5.nar
Compression: none
NarHash: sha256:19ipiavzya8c78pi5jxnjqidkzrzwgfz5f24bjc5d1s8pxzyyx8g
NarSize: 19297024
References: 8jw7q260l0q5yygr2n2wiya9qfmpmihp-libffi-3.4.2 bwhz3l5gilp2i3zw6a42vvwbx5ba9gf9-readline-6.3p08 bwlh333gnyiz1rkrwlg7k1cjdrsb3h6z-libyaml-0.2.5 gkzmfpb04ddb7phzj8g9sl6saxzprssg-gcc-wrapper-10.3.0 hk73982nqs66dyi88kw6y889q4rf430f-gdbm-1.20 jjv2lgwg9a5wg88xaa195fa47r6skfiz-ncurses-6.2 ligwjs9jslf1na0mxzycjd1lvpqrr2fn-openssl-1.1.1l s0c1m63ajbd63aw3qsd3823dwffr55h7-gnugrep-3.7 v1d8l3zqnia3hccqd0701szhlx22g54z-bash-5.1-p8 vjq3q7dq8vmc13c3py97v27qwizvq7fd-glibc-2.33-59 wrw3b1mp1dc5hvzbqi6dr37yslxvjxxk-coreutils-9.0 yzhsc36b7mdp2gd9xa35hwfw7aaj366c-zlib-1.2.11 z8d9wkiyad59ikv0jwck3v2k80plfbk5-ruby-2.7.5
Deriver: wbfd85rizfckxjrfb2r8f1y1ldniqbvp-ruby-2.7.5.drv
System: x86_64-linux

As served by nix-casync (same as above):

StorePath: /nix/store/z8d9wkiyad59ikv0jwck3v2k80plfbk5-ruby-2.7.5
URL: nar/19ipiavzya8c78pi5jxnjqidkzrzwgfz5f24bjc5d1s8pxzyyx8g.nar
Compression: none
NarHash: sha256:19ipiavzya8c78pi5jxnjqidkzrzwgfz5f24bjc5d1s8pxzyyx8g
NarSize: 19297024
References: 
Deriver: wbfd85rizfckxjrfb2r8f1y1ldniqbvp-ruby-2.7.5.drv

Comparing them now, it also seems to be missing System, not sure if that's relevant.

flokli commented 2 years ago

I refactored tests a bit (see https://github.com/flokli/nix-casync/pull/31), and am not able to reproduce it there. Especially the lost System field looks very odd - we definitely don't do anything special with it (except omitting it if it's an empty string).

Can you peek at the file store contents? How does the JSON in there look like?

flokli commented 2 years ago

Also note my whole system /run/current-system, copied to disk:

nix copy --extra-experimental-features nix-command --to file:///home/flokli/somestore\?compression=none /run/current-system

Doesn't include a single .narinfo with System not being empty/omitted.

Do you have a output path to reproduce from that exists on cache.nixos.org?

ajs124 commented 2 years ago

I can look at the JSON files, sure. How are they structured on disk?

/nix/store/xhzp3qh7607c46r6hvq4y8rps58gxcxf-postfix-3.6.3 should be on cache.nixos.org. Returns this, for me:

StorePath: /nix/store/xhzp3qh7607c46r6hvq4y8rps58gxcxf-postfix-3.6.3
URL: nar/1z1azqvsgwr6yabwzsag2asrdcd0wqjfkn0w11nvh1pv63gyab7k.nar
Compression: none
NarHash: sha256:1z1azqvsgwr6yabwzsag2asrdcd0wqjfkn0w11nvh1pv63gyab7k
NarSize: 22403048
References: 
Deriver: pl0cv8bvflf448jaqzvckgc2h7j3s2r0-postfix-3.6.3.drv
Sig: cache.nixos.org-1:7CJq8f5DCU0JXsIprbcLfQ2yFclsxYABfEE1M8AjUGkcb8z7KUDm2RaK7zEiX2BWSjym22NGvUGGk6Do0Yq6BA==

empty References, no System. Idk where System's coming from, maybe that's just something that eris does?

ajs124 commented 2 years ago

Yup: https://github.com/thoughtpolice/eris/blob/master/eris.pl#L619 So the System: is irrelevant, this is really just about References:.

flokli commented 2 years ago

I can look at the JSON files, sure. How are they structured on disk?

Check your state directory, there's a narinfo subfolder with the metadata, containing narmeta and pathinfo folders. Each contain json files (in a tree structure). narmeta uses the narhash as a key, pathinfo the output hash.

Fields persisted mirror the PathInfo and NarMeta structs (from pkg/store/metadatastore/types.go).

ajs124 commented 2 years ago

Any reason why the on-disk storage uses base-16 instead of base-32 or something else?

The narmeta is this:

{
  "NarHash": "8yzl3zD7BrhtCBzY6STmoLGWtRJP6c+X8ibzpzf+Kvw=",
  "Size": 22403048,
  "References": null,
  "ReferencesStr": null
}

pathinfo is this:

{
  "OutputHash": "rrP+UNE3I0/whiYbwg4wB+JxP+w=",
  "Name": "postfix-3.6.3",
  "NarHash": "8yzl3zD7BrhtCBzY6STmoLGWtRJP6c+X8ibzpzf+Kvw=",
  "Deriver": "pl0cv8bvflf448jaqzvckgc2h7j3s2r0-postfix-3.6.3.drv",
  "System": "",
  "NarinfoSignatures": [
    {
      "KeyName": "cache.nixos.org-1",
      "Digest": "7CJq8f5DCU0JXsIprbcLfQ2yFclsxYABfEE1M8AjUGkcb8z7KUDm2RaK7zEiX2BWSjym22NGvUGGk6Do0Yq6BA=="
    }
  ],
  "CA": ""
}
flokli commented 2 years ago

Any reason why the on-disk storage uses base-16 instead of base-32 or something else?

No particular reason, this was just one serialization I picked - hex.EncodeToString was just easier to reach than nixbase32. I however agree that'd improve debuggability.

I'll send a PR that'll change the encoding.

flokli commented 2 years ago

Please see https://github.com/flokli/nix-casync/pull/32 (and the commit message) which hopefully fixes the originally reported issue.

ajs124 commented 2 years ago
--- a/pkg/server/server.go
+++ b/pkg/server/server.go
@@ -140,6 +140,12 @@ func (s *Server) handleNarinfo(w http.ResponseWriter, r *http.Request) {
                        http.Error(w, "NarMeta is conflicting", http.StatusBadRequest)
                }

+               err = s.metadataStore.PutPathInfo(r.Context(), sentPathInfo)
+               if err != nil {
+                       http.Error(w, fmt.Sprintf("Error putting PathInfo: %v", err), http.StatusInternalServerError)
+                       return
+               }
+
                // HACK: until we implement our own reference scanner on NAR upload, we
                // populate NarMeta.References[Str] on .narinfo upload,
                // if it's empty right now.
@@ -155,11 +161,6 @@ func (s *Server) handleNarinfo(w http.ResponseWriter, r *http.Request) {
                        http.Error(w, "NarMeta (References) is conflicting", http.StatusBadRequest)
                }

-               err = s.metadataStore.PutPathInfo(r.Context(), sentPathInfo)
-               if err != nil {
-                       http.Error(w, fmt.Sprintf("Error putting PathInfo: %v", err), http.StatusInternalServerError)
-                       return
-               }
                return
        }
        http.Error(w, http.StatusText(http.StatusMethodNotAllowed), http.StatusMethodNotAllowed)

This seems to fix the issue, idk if it's correct though.

As written on matrix, the error seems to occur because this check is looking for a pathinfo that hasn't been written yet. The error returned should probably be logged, e.g. like this:

--- a/pkg/server/server.go
+++ b/pkg/server/server.go
@@ -146,7 +146,10 @@ func (s *Server) handleNarinfo(w http.ResponseWriter, r *http.Request) {
                if len(narMeta.References) == 0 && len(sentNarMeta.References) != 0 {
                        narMeta.ReferencesStr = sentNarMeta.ReferencesStr
                        narMeta.References = sentNarMeta.References
-                       s.metadataStore.PutNarMeta(r.Context(), narMeta)
+                       err = s.metadataStore.PutNarMeta(r.Context(), narMeta)
+                       if err != nil {
+                               log.Error(err)
+                       }
                }

                // Do full comparison of NarMeta, including references
flokli commented 2 years ago

We figured out the PathInfo that nix-casync was looking for is the PathInfo it is currently about to write - we didn't properly handle self-references.

flokli commented 2 years ago

@ajs124 with https://github.com/flokli/nix-casync/pull/36 merged, can you confirm uploads now fail?

ajs124 commented 2 years ago

Yup, nix-casync log looks like this:

INFO[0000] Starting Server at [::]:9000                 
2022/01/17 13:59:21 "GET http://[::1]:9000/nix-cache-info HTTP/1.1" from [::1]:59050 - 200 51B in 21.32µs
2022/01/17 13:59:21 "GET http://[::1]:9000/0clx8n6zxksqjsxvnljndqw40cw0wxzq.narinfo HTTP/1.1" from [::1]:59050 - 404 132B in 110.54µs
2022/01/17 13:59:21 "GET http://[::1]:9000/2g2c3a58nnya7fmpmwx2nkv4v1109g3x.narinfo HTTP/1.1" from [::1]:59050 - 404 132B in 53.96µs
2022/01/17 13:59:21 "GET http://[::1]:9000/0k7pcp8kw30vcq2zr767f9mhscngqv7c.narinfo HTTP/1.1" from [::1]:59052 - 404 132B in 78.83µs
2022/01/17 13:59:21 "GET http://[::1]:9000/3p96qwx1bgdjmam1whg3f728kkz8svl4.narinfo HTTP/1.1" from [::1]:59054 - 404 132B in 91.42µs
2022/01/17 13:59:21 "GET http://[::1]:9000/0mf6csg7mmr79s7g9m5k8sqn5p6ya974.narinfo HTTP/1.1" from [::1]:59056 - 404 132B in 70.84µs
2022/01/17 13:59:21 "GET http://[::1]:9000/ik2f891ci0nqj1byvf2aa1m4l559r01i.narinfo HTTP/1.1" from [::1]:59058 - 404 132B in 75.21µs
2022/01/17 13:59:21 "GET http://[::1]:9000/wrw3b1mp1dc5hvzbqi6dr37yslxvjxxk.narinfo HTTP/1.1" from [::1]:59050 - 404 132B in 84.721µs
2022/01/17 13:59:21 "GET http://[::1]:9000/xif6gg595hgmqawrcarapa8j2r7fbz9w.narinfo HTTP/1.1" from [::1]:59056 - 404 132B in 87.88µs
2022/01/17 13:59:21 "GET http://[::1]:9000/v1d8l3zqnia3hccqd0701szhlx22g54z.narinfo HTTP/1.1" from [::1]:59052 - 404 132B in 129.11µs
2022/01/17 13:59:21 "GET http://[::1]:9000/xpqs5rzb3bmvjah747gb40540v527hf2.narinfo HTTP/1.1" from [::1]:59054 - 404 132B in 87.331µs
2022/01/17 13:59:21 "GET http://[::1]:9000/m756011mkf1i0ki78i8y6ac3gf8qphvi.narinfo HTTP/1.1" from [::1]:59052 - 404 132B in 98.55µs
2022/01/17 13:59:21 "GET http://[::1]:9000/96n2v3w5b1ljgk7d6bj5drx62zj0g453.narinfo HTTP/1.1" from [::1]:59060 - 404 132B in 118.41µs
2022/01/17 13:59:21 "GET http://[::1]:9000/ai52y6fax40wrma7cbxqsbkimfviwdyc.narinfo HTTP/1.1" from [::1]:59062 - 404 132B in 101.03µs
2022/01/17 13:59:21 "GET http://[::1]:9000/xdbd16gax9kiw39lxzv2946jwssf4sx8.narinfo HTTP/1.1" from [::1]:59056 - 404 132B in 106.5µs
2022/01/17 13:59:21 "GET http://[::1]:9000/xhzp3qh7607c46r6hvq4y8rps58gxcxf.narinfo HTTP/1.1" from [::1]:59050 - 404 132B in 121.56µs
2022/01/17 13:59:21 "GET http://[::1]:9000/mrkbr4jqb1jsbdvs8nsdb9g3qicmyhrq.narinfo HTTP/1.1" from [::1]:59064 - 404 132B in 155.12µs
2022/01/17 13:59:21 "GET http://[::1]:9000/scz4zbxirykss3hh5iahgl39wk9wpaps.narinfo HTTP/1.1" from [::1]:59068 - 404 132B in 822.94µs
2022/01/17 13:59:21 "GET http://[::1]:9000/s0c1m63ajbd63aw3qsd3823dwffr55h7.narinfo HTTP/1.1" from [::1]:59070 - 404 132B in 133.921µs
2022/01/17 13:59:21 "GET http://[::1]:9000/vffsfb1m3ps7bbsm6mv0ispm87saz230.narinfo HTTP/1.1" from [::1]:59066 - 404 132B in 71.56µs
2022/01/17 13:59:21 "GET http://[::1]:9000/qdvqn453xkyc29anlp1915ka5c9zqawa.narinfo HTTP/1.1" from [::1]:59072 - 404 132B in 72.27µs
2022/01/17 13:59:21 "GET http://[::1]:9000/0zmm8mh9r38wd424dliwdm3ccbkw9lqr.narinfo HTTP/1.1" from [::1]:59074 - 404 132B in 69.96µs
2022/01/17 13:59:21 "GET http://[::1]:9000/3011h67wfska27z5rp0gi112znvkyb5h.narinfo HTTP/1.1" from [::1]:59076 - 404 132B in 90.19µs
2022/01/17 13:59:21 "GET http://[::1]:9000/vjq3q7dq8vmc13c3py97v27qwizvq7fd.narinfo HTTP/1.1" from [::1]:59078 - 404 132B in 68.68µs
2022/01/17 13:59:21 "GET http://[::1]:9000/ln6lp86rcilxx2vlmk8xszan9s71iqf6.narinfo HTTP/1.1" from [::1]:59080 - 404 132B in 478.06µs
2022/01/17 13:59:21 "GET http://[::1]:9000/ligwjs9jslf1na0mxzycjd1lvpqrr2fn.narinfo HTTP/1.1" from [::1]:59082 - 404 132B in 114.68µs
2022/01/17 13:59:21 "GET http://[::1]:9000/8nxw4pxya14s93bgbcqi7xzxf27fqf09.narinfo HTTP/1.1" from [::1]:59084 - 404 132B in 81.27µs
2022/01/17 13:59:21 "GET http://[::1]:9000/hqfkxrz9dxaq8azbzdqc6va29a41jhz7.narinfo HTTP/1.1" from [::1]:59086 - 404 132B in 73.36µs
2022/01/17 13:59:21 "GET http://[::1]:9000/qbdsd82q5fyr0v31cvfxda0n0h7jh03g.narinfo HTTP/1.1" from [::1]:59088 - 404 132B in 56.73µs
2022/01/17 13:59:21 "GET http://[::1]:9000/almr0dpjglrgch7na97i95jljd5465dq.narinfo HTTP/1.1" from [::1]:59090 - 404 132B in 54.53µs
2022/01/17 13:59:21 "GET http://[::1]:9000/8x1kf2pcaw45b8yvvk49d0dnk3vvgyzj.narinfo HTTP/1.1" from [::1]:59092 - 404 132B in 68.86µs
2022/01/17 13:59:21 "GET http://[::1]:9000/6in9l3cij07lmxzl6pfis6a90hs5wlj1.narinfo HTTP/1.1" from [::1]:59094 - 404 132B in 82.92µs
2022/01/17 13:59:21 "GET http://[::1]:9000/mjhrz69mvlm6q2vi3frs44ypkbfxywpf.narinfo HTTP/1.1" from [::1]:59096 - 404 132B in 73.22µs
2022/01/17 13:59:21 "GET http://[::1]:9000/ylc5l0bk2a128xlw5p330nfc7sripw6y.narinfo HTTP/1.1" from [::1]:59098 - 404 132B in 78.35µs
2022/01/17 13:59:21 "HEAD http://[::1]:9000/nar/100129y7bv20c2vp0n11ga7sr3ziiz5paxali19030asbllf4fwd.nar HTTP/1.1" from [::1]:59054 - 404 208B in 129.9µs
2022/01/17 13:59:21 "PUT http://[::1]:9000/nar/100129y7bv20c2vp0n11ga7sr3ziiz5paxali19030asbllf4fwd.nar HTTP/1.1" from [::1]:59054 - 000 0B in 46.524845ms
ERRO[0325] Failed to update NarMeta with References from pathinfo libunistring-0.9.10: referred reference qbdsd82q5fyr0v31cvfxda0n0h7jh03g-libunistring-0.9.10 doesn't exist: open /tmp/test_cache/narinfo/pathinfo/qbds/qbdsd82q5fyr0v31cvfxda0n0h7jh03g.json: no such file or directory 
2022/01/17 13:59:21 "PUT http://[::1]:9000/qbdsd82q5fyr0v31cvfxda0n0h7jh03g.narinfo HTTP/1.1" from [::1]:59054 - 500 22B in 459.5µs
ERRO[0375] Error parsing .narinfo: read tcp [::1]:9000->[::1]:59054: i/o timeout 
2022/01/17 14:00:11 "PUT http://[::1]:9000/qbdsd82q5fyr0v31cvfxda0n0h7jh03g.narinfo HTTP/1.1" from [::1]:59054 - 400 70B in 50.001098638s

and nix copy like this:

warning: unable to upload 'http://[::1]:9000/qbdsd82q5fyr0v31cvfxda0n0h7jh03g.narinfo': HTTP error 500; retrying in 271 ms
[0 copied (1.6 MiB), 0.0 MiB DL]
error: while uploading to HTTP binary cache at 'http://[::1]:9000': unable to upload 'http://[::1]:9000/qbdsd82q5fyr0v31cvfxda0n0h7jh03g.narinfo': HTTP error 400