Closed ardangelo closed 9 months ago
I'm not sure, but could this be because you are calling flux filemap get
on the same node on which you mapped the file? When you call get, if there are multiple chunks then the first chunk might be written thus corrupting the map. Perhaps with a single chunk the file does not actually change since data is written in one go.
Try the flux filemap get
to a different path with -C
and see if that resolves the issue.
Ah, @ardangelo, I apologize. I didn't notice you were already using -C
in flux filemap map
.
I think your usage should work, and FWIW, I could not reproduce the error with a simple test case:
$ flux version
commands: 0.57.0
libflux-core: 0.57.0
libflux-security: 0.10.0
broker: 0.57.0
FLUX_URI: local:///var/tmp/grondo/flux-TUaqej/local-0
build-options: +ascii-only+systemd+hwloc==2.8.0+zmq==4.3.4
$ ls -lh map
total 0
-rw-r--r-- 1 grondo grondo 10M Jan 4 16:41 10M
$ flux filemap map -C $PWD/map 10M
$ flux filemap list
10M
$ flux filemap get 10M
$ ls -lh 10M
-rw-r--r-- 1 grondo grondo 10M Jan 4 16:51 10M
$ flux filemap list --raw | jq
{
"path": "10M",
"encoding": "blobvec",
"data": [
[
0,
1048576,
"sha1-3b71f43ff30f4b15b5cd85dd9e95ebc7e84eb5a3"
],
[
1048576,
1048576,
"sha1-3b71f43ff30f4b15b5cd85dd9e95ebc7e84eb5a3"
],
[
2097152,
1048576,
"sha1-3b71f43ff30f4b15b5cd85dd9e95ebc7e84eb5a3"
],
[
3145728,
1048576,
"sha1-3b71f43ff30f4b15b5cd85dd9e95ebc7e84eb5a3"
],
[
4194304,
1048576,
"sha1-3b71f43ff30f4b15b5cd85dd9e95ebc7e84eb5a3"
],
[
5242880,
1048576,
"sha1-3b71f43ff30f4b15b5cd85dd9e95ebc7e84eb5a3"
],
[
6291456,
1048576,
"sha1-3b71f43ff30f4b15b5cd85dd9e95ebc7e84eb5a3"
],
[
7340032,
1048576,
"sha1-3b71f43ff30f4b15b5cd85dd9e95ebc7e84eb5a3"
],
[
8388608,
1048576,
"sha1-3b71f43ff30f4b15b5cd85dd9e95ebc7e84eb5a3"
],
[
9437184,
1048576,
"sha1-3b71f43ff30f4b15b5cd85dd9e95ebc7e84eb5a3"
]
],
"size": 10485760,
"mtime": 1704415313,
"ctime": 1704415382,
"mode": 33188
}
Thanks for checking. It seems to be an intermittent problem on the system I'm testing on. I'm able to work around it by only pulling the file on the ranks that aren't running the filemap.
Thanks for reporting back. The error:
flux-filemap: cti_be_daemon2.18.3: error loading offset=0 size=1048576 from sha1-459929af132509bbcc513878cb6360ed04c2ad6c: mapped file content has changed
Does seem to indicate that somehow the mapped file content is changing somehow, but in your case I can't imagine how and I'm not sure how the broker detects the changed content. Perhaps @garlick will have some thoughts here.
I had to refresh my memory, but this code comment describes how the detection occurs:
Safety issue: The content addressable storage model relies on the fact that once hashed, data does not change. However, this cannot be assured when the data is mmapped from a file that may not be protected from updates. To avoid propagating bad data in the cache, content_mmap_validate() is called each time an mmapped cache entry is accessed. This function recomputes the hash to make sure the content has not changed. If the data has changed, the content-cache returns an error to the requestor. In addition, mmapped pages could become invalid if the size of a mapped file is reduced. Accessing invalid pages could cause the broker to crash with SIGBUS. To mitigate this, content_mmap_validate() also calls stat(2) on the file to make sure the memory region is still valid. This is not foolproof because it is inherently a time-of-check-time-of-use problem. In fact we rate limit the calls to stat(2) to avoid a "stat storm" when a file with many blobrefs is accessed, which increases the window where it could have changed. But it's likely better than not checking at all.
It sure sounds like the source file is getting clobbered by one or more of the flux filemap get
commands. If that's not a possibility then we should dig deeper to figure out what is going on.
This thing is a bit of a foot gun though. Hmm, currently we're just using libarchive's defaults for file extraction. Maybe we could tell it not to ovewrite existing files, e.g. set
ARCHIVE_EXTRACT_NO_OVERWRITE Existing files on disk will not be overwritten. By de‐ fault, existing regular files are truncated and overwrit‐ ten; existing directories will have their permissions up‐ dated; other pre-existing objects are unlinked and recre‐ ated from scratch.
https://man.freebsd.org/cgi/man.cgi?query=archive_write_disk
In 0.59, the source/destination overlap will be less easy to fall into at least. Let's close this for now on the assumption that something must've changed in the source file during broadcast. We can reopen if we get a verified case that contradicts that assumption.
Running Flux 0.57.0, I am attempting to map a file using
flux filemap
,File information
Map with default 1M chunk size fails get
Map with unlimited chunk size enables get