Open MaZderMind opened 1 year ago
@MaZderMind If these mounts still exist, can you send me the contents of each log
file? I doubt there will be useful information there without having a debug build... but what fuse-ts does is to cache the open FDs on the segments, so it is technically possible that the segments have been replaced by new files (or changed size if they were still growing) and the "old" mount still sees the old files because it has an open FD on them.
As a next measure you could do a
echo 1 >> /video/fuse/einzelaufnahmen/einzelaufnahmen/eventspace/24/rebuild
and see if the
sha1sum /video/fuse/einzelaufnahmen/einzelaufnahmen/eventspace/24/uncut.ts
changes.
The mounts kind'of fixed themselfs shortly after me opening the issue but I have copies of them. The log is identical apart from the changes to the cut-times. I attached them. logs.zip
As a temporary countermeasure we added -xerror
to our encoding profiles' ffmpeg command, which will make ffmpeg exit with an error-code when an input stream is broken This will alarm us when the error happens again. We will run this setup at T4AT in 2 Weeks and if we'll encounter any more of these, will report them here.
I also theorized about the remove-rename scenario; will check proc/…/fd
next time. Growing files are certainly a possibility, but that would have been a problem with the ffmpeg segmenter ever since, wouldn't it?
Yes, this was always an issue, but it is resolved by the "EndPadding" mechanism.
I mean, you mentioned different uncut.ts, not broken ones. If the last segment file that is found when creating the mount is growing later, there might be differences in the data delivered in uncut.ts - but this would always only happen alongside a change in advertised size of uncut.ts which is not happening in your case.
I remember that there issues with the FD cache, it is possible that there is a kind of refcount mechanism and that the repair you have seen is correlated with the last open FD on uncut.ts to be closed, and then later reopened with a refreshed segment collection. But this also needs some external change to the segments to be a plausible explanation.
We still have the Problem regularly, about every 5 fuse-ts invocations. It can be detected by parsing the .ts file with ffmpeg -xerror -i /video/fuse/witn2023/women-in-tech-night-23/grace_hopper/878/uncut.ts -f null /dev/null
which is reasonable fast. We added -xerror
to our encoding profiles to catch that issue, which then looks something like this:
[h264 @ 0x55ab2fe62640] concealing 7025 DC, 7025 AC, 7025 MV errors in B frame
[h264 @ 0x55ab2fde3fc0] co located POCs unavailable
[h264 @ 0x55ab2fe94240] co located POCs unavailable
[h264 @ 0x55ab2fe94240] Reference 2 >= 2
[h264 @ 0x55ab2fe94240] error while decoding MB 119 34, bytestream 1336
[h264 @ 0x55ab2fe94240] concealing 4010 DC, 4010 AC, 4010 MV errors in B frame
/video/fuse/witn2023/women-in-tech-night-23/grace_hopper/878/uncut.ts: corrupt decoded frame in stream 0
Conversion failed!
echo 1 >> /video/fuse/witn2023/women-in-tech-night-23/grace_hopper/878/rebuild
does not fix the problem, however re-preparing does fix it (a following scan with xerror
is clean of errors).
Examining the broken and the good filed with xxd and a small python script shows, that, while the files have exactly the same length, their content differs (and the broken file indeed contains different content not just zeros):
python3 -u cmp.py <(xxd 878-broken.ts) <(xxd 878-good.ts)
First difference in Line 219464705:
A: d14c4000: 83ee 236b 0dda 123f 0824 94fb fae1 e52f ..#k...?.$...../
B: d14c4000: 2c00 1953 af53 63c5 f9aa ddef 10f5 5156 ,..S.Sc.......QV
What could we do to help identify the root cause? As I said I'm quite sure I can reproduce the Problem, maybe even kind-of automatic. The current process is something like Mount -> Cut in Shotcut (includes seeking through the file) -> broken in 1 out of 5 cases -> re-prepare -> good, so the Act of Seeking in the File with Shotcut might be one of the triggering factors.
As the Problem can be triggered after the Event and we have the time to do experiments on this, we could absolutely try a debug build in order to check output, however it might produce a huge amount of unrelated output that I might need to filter. I could also attach a gdb session onto a broken fuse-ts process and start a fresh one in parallel, so we could examine them in-situ, however I might need some assistance with that.
@MaZderMind If you could provide the "filelist" content as well as the "log" output, which should both be available as virtual files, it might be sufficient. Having a debug build and its output would greatly help, compiling should be very easy, it is just a "make debug" in the source directory. I would not care digging through lots of logging, as long as you provide one "good" and one "bad" output. Highly appreciated!
Alright, I'll use the debug-build to cut the remaining Videos tomorrow and collect all the output and information that I can whenever I hit another broken file.
I have a very mysterious problem, whereby two identical fuse-ts binaries running on the same system accessing the same segments produce different uncut.ts files:
I ran shasum, hexdump, rsync and lots of different tools on the mount and kept getting consistent results. I also had 2 of 2 mounts from yesterday evening broken this way, that, upon re-invocation today are now fine.