a-tze / fuse-ts

GNU General Public License v3.0
10 stars 7 forks source link

Same fuse-ts binaries on the same system accessing the same segments produce different uncut.ts #16

Open MaZderMind opened 1 year ago

MaZderMind commented 1 year ago

I have a very mysterious problem, whereby two identical fuse-ts binaries running on the same system accessing the same segments produce different uncut.ts files:

# same invocation
voc       694852  0.0  0.0 299160  2220 ?        Ssl  Nov15   0:27 /usr/bin/fuse-ts p=eventspace- c=/video/obs st=2022-11-15_15-45-00 numfiles=29 totalframes=126000 fps=30 -oallow_other,use_ino /video/fuse/einzelaufnahmen/einzelaufnahmen/eventspace/24
voc      1047543  1.0  0.0 151424  1496 ?        Ssl  11:47   0:05 /usr/bin/fuse-ts p=eventspace- c=/video/obs st=2022-11-15_15-45-00 numfiles=29 totalframes=126000 fps=30 -oallow_other,use_ino /video/fuse/einzelaufnahmen/einzelaufnahmen/eventspace/24-new

# same cmdline except for mount path
[wi-5.video.encoder-eventspace][pkoerner@~] > diff /video/fuse/einzelaufnahmen/einzelaufnahmen/eventspace/24/cmdlineopts /video/fuse/einzelaufnahmen/einzelaufnahmen/eventspace/24-new/cmdlineopts 
1c1
< fuse-ts  c=/video/obs st=2022-11-15_15-45-00 ob=0 numfiles=29  totalframes=126000  if=7179  of=109364  p=eventspace-  fps=30  height=1080   -oallow_other,use_ino /video/fuse/einzelaufnahmen/einzelaufnahmen/eventspace/24 
---
> fuse-ts  c=/video/obs st=2022-11-15_15-45-00 ob=0 numfiles=29  totalframes=126000  if=7179  of=109364  p=eventspace-  fps=30  height=1080   -oallow_other,use_ino /video/fuse/einzelaufnahmen/einzelaufnahmen/eventspace/24-new 

# same filelist
[wi-5.video.encoder-eventspace][pkoerner@~] > diff /video/fuse/einzelaufnahmen/einzelaufnahmen/eventspace/24/filelist /video/fuse/einzelaufnahmen/einzelaufnahmen/eventspace/24-new/filelist 

# same exe
[wi-5.video.encoder-eventspace][pkoerner@~] > sudo sha1sum /proc/$(cat /video/fuse/einzelaufnahmen/einzelaufnahmen/eventspace/24/pid)/exe
a731036d4b9fbc7fefbeab89fbfd3a646ab82790  /proc/694852/exe

[wi-5.video.encoder-eventspace][pkoerner@~] > sudo sha1sum /proc/$(cat /video/fuse/einzelaufnahmen/einzelaufnahmen/eventspace/24-new/pid)/exe
a731036d4b9fbc7fefbeab89fbfd3a646ab82790  /proc/1047543/exe

# uncut.ts has same size
[wi-5.video.encoder-eventspace][pkoerner@/video/obs] > ls -la /video/fuse/einzelaufnahmen/einzelaufnahmen/eventspace/24/uncut.ts 
-r--r--r-- 1 root root 4897138116 Nov 15 18:27 /video/fuse/einzelaufnahmen/einzelaufnahmen/eventspace/24/uncut.ts

[wi-5.video.encoder-eventspace][pkoerner@/video/obs] > ls -la /video/fuse/einzelaufnahmen/einzelaufnahmen/eventspace/24-new/uncut.ts 
-r--r--r-- 1 root root 4897138116 Nov 16 11:47 /video/fuse/einzelaufnahmen/einzelaufnahmen/eventspace/24-new/uncut.ts

# shasums differ
# bad
[wi-5.video.encoder-eventspace][pkoerner@~] > sha1sum /video/fuse/einzelaufnahmen/einzelaufnahmen/eventspace/24/uncut.ts 
05e48b34051f58c901eb28e1ee02cf9808298661  /video/fuse/einzelaufnahmen/einzelaufnahmen/eventspace/24/uncut.ts

# good
[wi-5.video.encoder-eventspace][pkoerner@~] > sha1sum /video/fuse/einzelaufnahmen/einzelaufnahmen/eventspace/24-new/uncut.ts 
ccbf312be8bbacca348de56e926187f50f9877c9  /video/fuse/einzelaufnahmen/einzelaufnahmen/eventspace/24-new/uncut.ts

I ran shasum, hexdump, rsync and lots of different tools on the mount and kept getting consistent results. I also had 2 of 2 mounts from yesterday evening broken this way, that, upon re-invocation today are now fine.

a-tze commented 1 year ago

@MaZderMind If these mounts still exist, can you send me the contents of each log file? I doubt there will be useful information there without having a debug build... but what fuse-ts does is to cache the open FDs on the segments, so it is technically possible that the segments have been replaced by new files (or changed size if they were still growing) and the "old" mount still sees the old files because it has an open FD on them.

As a next measure you could do a echo 1 >> /video/fuse/einzelaufnahmen/einzelaufnahmen/eventspace/24/rebuild and see if the sha1sum /video/fuse/einzelaufnahmen/einzelaufnahmen/eventspace/24/uncut.ts changes.

MaZderMind commented 1 year ago

The mounts kind'of fixed themselfs shortly after me opening the issue but I have copies of them. The log is identical apart from the changes to the cut-times. I attached them. logs.zip

As a temporary countermeasure we added -xerror to our encoding profiles' ffmpeg command, which will make ffmpeg exit with an error-code when an input stream is broken This will alarm us when the error happens again. We will run this setup at T4AT in 2 Weeks and if we'll encounter any more of these, will report them here.

MaZderMind commented 1 year ago

I also theorized about the remove-rename scenario; will check proc/…/fd next time. Growing files are certainly a possibility, but that would have been a problem with the ffmpeg segmenter ever since, wouldn't it?

a-tze commented 1 year ago

Yes, this was always an issue, but it is resolved by the "EndPadding" mechanism.

I mean, you mentioned different uncut.ts, not broken ones. If the last segment file that is found when creating the mount is growing later, there might be differences in the data delivered in uncut.ts - but this would always only happen alongside a change in advertised size of uncut.ts which is not happening in your case.

I remember that there issues with the FD cache, it is possible that there is a kind of refcount mechanism and that the repair you have seen is correlated with the last open FD on uncut.ts to be closed, and then later reopened with a refreshed segment collection. But this also needs some external change to the segments to be a plausible explanation.

MaZderMind commented 1 year ago

We still have the Problem regularly, about every 5 fuse-ts invocations. It can be detected by parsing the .ts file with ffmpeg -xerror -i /video/fuse/witn2023/women-in-tech-night-23/grace_hopper/878/uncut.ts -f null /dev/null which is reasonable fast. We added -xerror to our encoding profiles to catch that issue, which then looks something like this:

[h264 @ 0x55ab2fe62640] concealing 7025 DC, 7025 AC, 7025 MV errors in B frame  
[h264 @ 0x55ab2fde3fc0] co located POCs unavailable
[h264 @ 0x55ab2fe94240] co located POCs unavailable
[h264 @ 0x55ab2fe94240] Reference 2 >= 2
[h264 @ 0x55ab2fe94240] error while decoding MB 119 34, bytestream 1336
[h264 @ 0x55ab2fe94240] concealing 4010 DC, 4010 AC, 4010 MV errors in B frame
/video/fuse/witn2023/women-in-tech-night-23/grace_hopper/878/uncut.ts: corrupt decoded frame in stream 0
Conversion failed!

echo 1 >> /video/fuse/witn2023/women-in-tech-night-23/grace_hopper/878/rebuild does not fix the problem, however re-preparing does fix it (a following scan with xerror is clean of errors).

Examining the broken and the good filed with xxd and a small python script shows, that, while the files have exactly the same length, their content differs (and the broken file indeed contains different content not just zeros):

python3  -u cmp.py <(xxd 878-broken.ts) <(xxd 878-good.ts)
First difference in Line 219464705:
A: d14c4000: 83ee 236b 0dda 123f 0824 94fb fae1 e52f  ..#k...?.$...../
B: d14c4000: 2c00 1953 af53 63c5 f9aa ddef 10f5 5156  ,..S.Sc.......QV

What could we do to help identify the root cause? As I said I'm quite sure I can reproduce the Problem, maybe even kind-of automatic. The current process is something like Mount -> Cut in Shotcut (includes seeking through the file) -> broken in 1 out of 5 cases -> re-prepare -> good, so the Act of Seeking in the File with Shotcut might be one of the triggering factors.

As the Problem can be triggered after the Event and we have the time to do experiments on this, we could absolutely try a debug build in order to check output, however it might produce a huge amount of unrelated output that I might need to filter. I could also attach a gdb session onto a broken fuse-ts process and start a fresh one in parallel, so we could examine them in-situ, however I might need some assistance with that.

a-tze commented 1 year ago

@MaZderMind If you could provide the "filelist" content as well as the "log" output, which should both be available as virtual files, it might be sufficient. Having a debug build and its output would greatly help, compiling should be very easy, it is just a "make debug" in the source directory. I would not care digging through lots of logging, as long as you provide one "good" and one "bad" output. Highly appreciated!

MaZderMind commented 1 year ago

Alright, I'll use the debug-build to cut the remaining Videos tomorrow and collect all the output and information that I can whenever I hit another broken file.