HaveAGitGat / Tdarr

Tdarr - Distributed transcode automation using FFmpeg/HandBrake + Audio/Video library analytics + video health checking (Windows, macOS, Linux & Docker)
2.72k stars 89 forks source link

BUG Ghost file cleaner-upper deleting files it should not be deleting #988

Open tws101 opened 1 month ago

tws101 commented 1 month ago

Describe the bug If a plugin runs that produces multiple outputs written back to the input directory if two of these tasks happen concurrently a "Ghost File cleaner-upper" deletes the one that triggered first.

This bug is very hard to explain..... It took me a very long time to actually realize a bug was causing this...

This is present on a FLOW and A CLASSIC PLUGIN present on both so I know the issue is not the plugins. AND I am also sure because the plugins have no issue writing the files to the cache

To Reproduce

if two workers at the same time are writing multiple files back to the input directory one of the two will have the files it writes deleted immediately after they finish writing..... so within 2 seconds.... These are not the writes to the input directory that tdarr does when finishing a file these are writes caused by ffmpeg or by a flow copy file function

This is how I discovered the bug, When running a classic plugin that extracts subtitles I noticed a few would fail saying they were in a infinite loop but when ran again they complete..... Then I began to make a flow to handle this and in creating a working flow (that work provided one worker goes at a time) when I queued up 15 files (with many workers) many would say they ran but the subs that the log says were copied were not present on the disk and in fact tdarr deleted then from the disk the "Ghost File Cleaner-upper" (the bug) strikes again....

So create this flow (in image below) with these flow plugins

https://www.dropbox.com/scl/fo/6y24blsvbly833o99xzdq/AJl06WUwMatadLvBKWyYz04?rlkey=z1lifz907m425blcnmhd1i5yd&dl=0

image

Originally when writing this I made the plugin "Subtitles Extract Subs" (in the folder above) not write to the cache but instead have ffmeg write the sub directly to the input directory, when running multiple at a time subs would just be missing.... so I worked on the plugin some more and instead wrote it to write to the cache and then made "Copy Extracted SRT files" in order to write them from cache to the input folder... however they would ALWAYS write and stay in the cache but when multiple runs copied to the input folder at once from different workers the one that went first would have its files deleted....

Then I fixed this and discovered the bug when I rewrote "Copy Extracted SRT files" (in the above folder) in order to work around the weird issue... here in the code this is how I made it work:

code below I am referencing... we have used the flow in the screen shot and this is "Copy Extracted SRT files"

case 0 though 2 copies the srt files in the cache.... THEN case 3-6 halts for 5 seconds to allow the "Ghost File Cleaner-upper" (the bug) to delete the srt files and check to be sure the "Ghost File Cleaner-upper" did not delete them ends if it finds the srt but if it does not find then it halts 20 seconds so other files are not being written by another worker at the same time and the "Ghost File Cleaner-upper" (the bug) will hopefully not trigger then if it did trigger try the whole process again.

var plugin = function (args) { return __awaiter(void 0, void 0, void 0, function () {
  var fs, lib, sourceObject, outputObject, i, i2, file;
  return __generator(this, function (_b) {
    switch (_b.label) {
      case 0:
        fs = require('fs');
        lib = require('../../../../../methods/lib')();
        file = args.inputFileObj;
        // eslint-disable-next-line @typescript-eslint/no-unused-vars,no-param-reassign
        args.inputs = lib.loadDefaultValues(args.inputs, details);
        if (!(args.variables.srtExtract === true)) return [3 /*break*/, 9];
        sourceObject = args.variables.srtObjectsSource;
        outputObject = args.variables.srtObjectsDestination;
        i = 0;
        i2 = 0;
        _b.label = 1
      case 1:
        if (!(i < args.variables.srtObjectsSource.length)) return [3 /*break*/, 3];
        return [4 /*yield*/, fs_1.promises.copyFile(sourceObject[i], outputObject[i])];
      case 2:
        _b.sent();
        i += 1;
        args.jobLog(`File ${i} Copied`);
        return [3 /*break*/, 1];
      case 3:
        return [4 /*yield*/, new Promise(function (resolve) { return setTimeout(resolve, 5000); })];
      case 4:
        _b.sent();
        args.jobLog(`Checking Files`);
        _b.label = 5
      case 5:
        if (!(i2 < args.variables.srtObjectsSource.length)) return [3 /*break*/, 9];
        _b.label = 6
      case 6:
        if (fs.existsSync(outputObject[i2])) {
          i2 += 1;
          args.jobLog(`File ${i2} is good`);
          return [3 /*break*/, 5];
        } else {
          args.jobLog(`Recheck failed waiting 20 seconds and trying again. To many other copies at the same time.`);
          i = 0;
          i2 = 0;
          return [4 /*yield*/, new Promise(function (resolve) { return setTimeout(resolve, 20000); })];
        }
      case 7:
        _b.sent();
        return [3 /*break*/, 1];
      case 8:

      case 9: return [2 /*return*/, {
          outputFileObj: file,
          outputNumber: 1,
          variables: args.variables,
        }];   
    }
  });
}); };
exports.plugin = plugin;

Expected behavior Plugins should not need to halt check and be sure some other thing the "Ghost File Cleaner-upper" (the bug) did not just delete files they just wrote to the input folder.

Please provide the following information: Currently on version 2.17.01 on docker has been a present bug for YEARS. Although I did not know it existed until I discovered this writing a flow.

HaveAGitGat commented 3 weeks ago

Hi thanks for looking into this. Do you have a link to the uncompiled typescript code for the plugin? The compiled js file has a lot of gobbledy-gook and is not meant to be worked with.

For example, uncompiled: https://github.com/HaveAGitGat/Tdarr_Plugins/blob/master/FlowPluginsTs/CommunityFlowPlugins/audio/checkAudioCodec/1.0.0/index.ts

Compiled: https://github.com/HaveAGitGat/Tdarr_Plugins/blob/master/FlowPlugins/CommunityFlowPlugins/audio/checkAudioCodec/1.0.0/index.js

From the readme

Make sure typescript is intalled with npm i -g typescript then run tsc to compile the changes.

This will compile the ts code in FlowPluginsTs into FlowPlugins

tws101 commented 3 weeks ago

sadly I don't know typescript, I wrote these in JS....

But a classic plugin can also reproduce this issue, I just thought the flow highlighted it much better.

For a classic go here: https://github.com/HaveAGitGat/Tdarr_Plugins/pull/352/files#

File is: Tdarr_Plugin_tws101_Ultimate_Subtitle_Handling.js

when running this do an extract (plugin option) and run it across ten workers at once, the bug will reproduce.... then test again one worker at a time over the same ten test files and everything will work.

***Seems I best start learning typescript

HaveAGitGat commented 3 weeks ago

Thanks I’ll take a look at that.

Yeah would recommend it, it’s straightforward to pickup as all Javascript is valid Typescript so you can put all your .js code in a .ts file and it should run (only reason it wouldn’t run is if there are type errors but you can ignore these in the code by putting // @ts-expect-error above the line causing the error. Then can fix the error at a later time by adding/correcting types).

Tdarr classic plugins are in JavaScript but I switched to Typescript for Flow plugins as makes development easier as you can see what data a variable represents and code editors like VS Code will highlight if you try to use a wrong data type.

For example here you can see the data passed to flow plugins:

https://github.com/HaveAGitGat/Tdarr_Plugins/blob/09a7edf7550ae58b2c57ce1c544a3bfc5adedc90/FlowPluginsTs/FlowHelpers/1.0.0/interfaces/interfaces.ts#L112

IpluginInputArgs interface is then referenced in all flow plugins e.g here:

https://github.com/HaveAGitGat/Tdarr_Plugins/blob/09a7edf7550ae58b2c57ce1c544a3bfc5adedc90/FlowPluginsTs/CommunityFlowPlugins/file/copyMoveFolderContent/1.0.0/index.ts#L150

tws101 commented 3 weeks ago

As you have requested I have study and reworked this to typescript

https://www.dropbox.com/scl/fo/6y24blsvbly833o99xzdq/AJl06WUwMatadLvBKWyYz04?rlkey=z1lifz907m425blcnmhd1i5yd&dl=0

Typescript folder has that one plugin reworked

I compiled it and ran tests and I can confirm I have reproduce the bug

Code in question that outlines and fixes the bug

async function copyfiles(args: IpluginInputArgs) {
  let sourceObject = args.variables.srtObjectsSource;
  let outputObject = args.variables.srtObjectsDestination;
  for (let i = 0; i < args.variables.srtObjectsSource.length; i++) {
    await fs.copyFile(sourceObject[i], outputObject[i]);
    args.jobLog(`File ${i} Copied`);
  }
  await new Promise((resolve) => setTimeout(resolve, 5000));
  await checkfiles(args);
}

async function checkfiles(args: IpluginInputArgs) {
  args.jobLog(`Checking Files`);
  let outputObject = args.variables.srtObjectsDestination;
  const fs = require('fs');
  for (let i = 0; i < args.variables.srtObjectsSource.length; i++) {
    if (fs.existsSync(outputObject[i])) {
      args.jobLog(`File ${i} is good`);
    } else {
      args.jobLog(`Recheck failed waiting 20 seconds and trying again. To many other copies at the same time.`);
      await new Promise((resolve) => setTimeout(resolve, 20000));
      await copyfiles(args);
      break;
    }
    args.jobLog(`All Good`);
  }
}

// eslint-disable-next-line @typescript-eslint/no-unused-vars
const plugin = async (args: IpluginInputArgs): Promise<IpluginOutputArgs> => {
  const lib = require('../../../../../methods/lib')();
  // eslint-disable-next-line @typescript-eslint/no-unused-vars,no-param-reassign
  args.inputs = lib.loadDefaultValues(args.inputs, details);
  const file = args.inputFileObj;

  if (args.variables.srtExtract === true) {
    await copyfiles(args);
  }

  return {
    outputFileObj: file,
    outputNumber: 1,
    variables: args.variables,
  };
};
export {
  details,
  plugin,
};

Shown the bug and the work around both happening here

image

as seen in the image Recheck failed waiting 20 seconds and trying again. To many other copies at the same time.

The recheck failed meaning that copy operation the first time though was a victim of the Ghost File Cleaner.... it caught that the files were deleted then had to copy them again....

NOTEThe screen shot of the log is one variation older than the plugin code, So an exact match in logic is off because I have updated the code since screenshotting the log...... This difference is only in the JOB LOGS, no functionality was changed

HaveAGitGat commented 3 weeks ago

@tws101 thanks are you able to try these Docker images:

docker.io/haveagitgat/tdarr_acc:dev_2.18.01_2024_05_09T08_07_55z
docker.io/haveagitgat/tdarr_node_acc:dev_2.18.01_2024_05_09T08_07_55z

Then when this happens check the node or server logs for Deleting temporary SRT file created by CCExtractor:, hopefully that will be it.

tws101 commented 3 weeks ago

I am not sure I can do that....

On my TrueCharts version of the Tdarr install, is there a way to turn ON the required logging on to check this??

Also I have converted all my flows that are still experimental to typescript.... Some clean up and testing is still needed

At my link above they are located in the typescript folder... 25 flows (3 of the 25 are improvements on the existing flows like execute command and begin command, and remux container)

Testing on them will continue, please let me know if I can turn that logging ON, or as an alternative I can install on a windows machine if I can have a windows tdarr install that has this logging enabled

HaveAGitGat commented 3 weeks ago

I don't use or manage TrueCharts so not sure about that sorry.

Nice, sure can try these Windows packages:

https://tdarrs.s3.us-west-000.backblazeb2.com/dev/versions/2.18.01/win32_x64/Tdarr_Server_2024_05_10T05_59_19z.zip
https://tdarrs.s3.us-west-000.backblazeb2.com/dev/versions/2.18.01/win32_x64/Tdarr_Node_2024_05_10T05_59_19z.zip
tws101 commented 3 weeks ago

Okay test complete...

image

It is exactly as you thought Deleting temporary SRT file created by CCExtractor

Also I have updated my plugins folder they all now have current Typescript Originals and the complied versions as well

Going to run through more testing on them, I have an old pull request in Tdarr Plugins from 2022
https://github.com/HaveAGitGat/Tdarr_Plugins/pull/352

Before I make a new one who should I ask about review to get that one approved or denied so I can submit another one with these flow? (When testing is complete)

HaveAGitGat commented 2 weeks ago

Ok well at least we found the source of the issue. It stems from a bug in ccextractor where it does not respect the arguments passed to it and always produces an srt output file which Tdarr tries to clean up based on the time the srt was created. This isn't an issue if there's only one worker working on the directory but if you have multiple which include srt file copying etc then it can cause issues.

I've asked the ccextractor team about this so hopefully they have a solution.

In terms of the PR, I'm trying to prevent multiple plugins that do the same thing in the repo to reduce duplicates (such as re-ordering streams, remuxing etc) but have left the PR there just in case someone wants to use those.

tws101 commented 2 weeks ago

Okay I removed the 2 duplicate effect plugins in the PR

https://github.com/HaveAGitGat/Tdarr_Plugins/pull/352

Could I get what remains approved or denied so I can have make another PR, for flows... When I do the PR for flows I will not fork any existing plugins but edit instead, to prevent duplicates

HaveAGitGat commented 2 weeks ago

@tws101, make sure to remove this from top of each file:

/* eslint-disable */
// tdarrSkipTest
HaveAGitGat commented 2 weeks ago

Haven't had response from CCExtractor team yet, can see post here: https://app.slack.com/client/T04MB1C3V/C04MB1CAH

tws101 commented 2 weeks ago

Thank you for your help with CCExtractor,

I responded in the PR thread taking the actions you requested

tws101 commented 1 week ago

So I can't see any response unless I am a member of a workspace it seems

HaveAGitGat commented 1 week ago

I just created a normal account and could view it. They gave a response:

That might simply be a bug that needs fixing

So will need to wait for that fix unless can come up with a different way to deal with the SRT files it produces.