animetosho / ParPar

High performance PAR2 create client for NodeJS
190 stars 19 forks source link

When supplying a file list, does \n or \0 act as separator vs terminator? #35

Closed lkydev closed 2 years ago

lkydev commented 2 years ago

Running ParPar on Debian 11 under WSL (Windows Subsystem on Linux) on Windows 11.

ParPar was installed with the one liner npm install -g @animetosho/parpar.

On the help screen it says about input files:

  -i,  --input-file          Supply a list of files to be included as input,
                             separated by newlines. Can be `-` to read from
                             stdin, or a command prefixed with proc:// to read
                             from the stdout of specified process (example:
                             `proc://cat somefile.txt`). Can also be an fd
                             prefixed with fd:// (requires NodeJS >= 0.12),
                             i.e. `fd://0` is an alias for stdin.
                             Can be specified multiple times.
  -0,  --input-file0         Same as the `--input-file` option, except files
                             are separated by null characters instead of
                             newlines.

Are you using the word 'separated' in a rigorous sense? Normally if items in a list are separated by a delimiter, all except the last one has a trailing delimiter. If items are terminated, all (including the last one) have a trailing delimiter.

Anyway I did some test on the behaviour when using \0 and \n as separater and as terminator:

Test steps

  1. Create a test directory:

    mkdir ~/testbed
    cd ~/testbed
  2. Create three test files a, b and c.

    for file in a b c; do dd if=/dev/urandom of="$file" bs=10000 count=400; done
  3. Create parity set using \n as separator (works OK)

    printf "a\nb\nc" | parpar -i - -s10000B -r10% --out test1.par2
  4. Create parity set using \n as terminator (works OK)

    printf "a\nb\nc\n" | parpar -i - -s10000B -r10% --out test2.par2
  5. Create parity set using \0 as separator (works OK)

    printf "a\0b\0c" | parpar -0 - -s10000B -r10% --out test3.par2
  6. Create parity set using \0 as separator (not OK)

    printf "a\0b\0c\0" | parpar -0 - -s10000B -r10% --out test4.par2

    The error message is

    Error: ENOENT: no such file or directory, stat ''

So if \n is used to delimit paths, then ParPar does not mind whether the paths are separated or terminated. But if \0 is used, ParPar would encounter an error if used as a terminator.

Probably there are bugs but can't say which ones without knowing whether \n and \0 are intended as path separator or terminator.

[P.S. Usually such list is terminated because to create a separated list, you must add logic to make sure the last item does not end with a delimiter, which is less simple. ]

animetosho commented 2 years ago

Nice pick up!

Yes, it's meant to mean separated as opposed to terminated. There's actually a second assumption in place - with newline separators, it assumes the source is possibly human constructed, so will ignore blank lines, whilst with null separators, it assumes it's a precise list (e.g. program generated, such as the output of the find command), so strictly adheres to "separator" definition. (in your case, a trailing newline means that the last line is considered to be 'blank', hence ignored)

The point of the -0 switch is if an application developer wants to cater for the possibility of \r or \n characters appearing in filenames, so it's strictly safer in that context. Since it's intended for programs sending the data, I guess the point in the strictness is to help signal potential bugs in such application.

But I see little reason to not allow a trailing null though, so I'll put a change in to allow it to be either a separator or terminator.