Help to create a compatible implementation!

tgulacsi commented 7 years ago

Hi!

I've a fast Reed-Solomon encoder in Go (ported from Backblaze's Java code by Klaus Post): https://github.com/tgulacsi/par

I can create a par2 format archive, and can read it back in my program, but par2 just says "4 packets found", and then "Main packet not found". How could I debug this?

What is needed to be in the main packet? I write it both at the beginning and at the end, without change. What does it mean in the spec that main packet's ID must be calculated by its body? The body is the recovery/non-recovery file id's sets, AFAIK...

animetosho commented 7 years ago

Relevant quote:

The MD5 hash of the body of the main packet is used as the Recovery Set ID, which is included in the packet header of every packet for the set

See the Packet Header section (Table 87) for some clarification. All packets (including the main packet) must be preceded by a packet header.

The "Recovery Set ID" mentioned in the packet header, which appears in every packet (including the main packet) is an MD5 hash of the entire body of the main packet.

The body of the main packet encompasses all four elements listed in the spec in Table 126 (slice size, number of files and file IDs).

How could I debug this?

PAR2 strictly defines how packets are encoded, so packets created in any PAR2 tool should be identical. Note that it doesn't strictly define the order of the packets, nor how often you should repeat critical packets, so PAR2 files generated by different tools don't have to be identical, but the packets (excluding some like creator/comment etc) need to be.

As such, you can check your program by seeing whether the packets generated are identical to that of a reference implementation (such as par2cmdline).

PS: am looking forward to seeing another implementation of PAR2!

tgulacsi commented 7 years ago

Thanks for the response!

According to thee specification,

each packet has a header, which contains a checksum for the entire packet, including the recovery set id, the type, and the body of the packet.
"The MD5 hash of the body of the main packet is used as the Recovery Set ID", which is a hash of the slice size, the file count, and the file ids.
The File ID in this version is calculated as the MD5 Hash of the short MD5 hash of the file's first 16k, the length, and ASCII file name.

So I need to know all the included files (for me, 1 :-)), calculate its ID, put it in the Main packet, calculate the Recovery Set ID, put that into every packet, calculate each header's hash, and go on.

I'll try it.

Safihre commented 7 years ago

@tgulacsi Following your work with interest! If you can make it par2 compatible and achieve such speedups also for data repair, this has great promise for the usenet community!

tgulacsi commented 7 years ago

Hi!

I've implemented most what par2cmdline needs, so it understands the file I've written, but we (par2 and BackBlaze) differ on the Reed-Solomon implementation: GF(16) vs. GF(8), different polynomials... So these aren't interchangeable.

If someone can help with a proper logTable (https://github.com/klauspost/reedsolomon/blob/master/gentables.go), I can try to implement a par2-compatible erasure coding, but this won't be an easy, as I'll have to move from uint8 to uint16, with endianness. And maybe that won't be that much faster eventually, I don't know.

But now github.com/tgulacsi/par is enough for me: fast, can repair, know three different par file formats (JSON, tar and par2-like).

animetosho commented 7 years ago

A GF(8) implementation would be significantly faster than GF(16), but would limit you to 127 input and recovery blocks, which is quite different from PAR2 and how it's often used... (even the 32767 input block limitation in PAR2 is a bit small for larger inputs)

I had a quick scan of the GF implementation - I'd say moving to GF(16) is quite a bit more work than you may think:

static/precomputed tables cannot be used; they work in GF(8) because (2^8)^2 = 65536 8-bit values is feasible to store, but hard-coding (or even loading into RAM) a (2^16)^2 = 4.3 billion 16-bit number table doesn't make sense. This means that the algorithm needs to be able to dynamically generate lookup tables
furthermore, full log table lookups would be slow (due to CPU cache reasons). par2cmdline and most PAR2 tools use a L/H split lookup (i.e. 16-bit is broken into two 8-bit halves)
GF(16) SSSE3 implementation is quite different to a GF(8) implementation, particularly if speed is important (for example, a different intermediary memory layout is beneficial - see below)
note that if you're targeting x86 platforms, endianness shouldn't be a concern. The code looks like isn't only really optimised for x86, as SSSE3 is an x86 extension (I presume it falls back to slower lookup strategies if SSSE3 is unavailable).

I've graphed the speeds of a few interesting GF(16) calculation techniques here. Algorithms that may be interesting:

LOG: standard textbook-style log table lookup
SPLIT 16 8: L/H split log table lookup used in par2cmdline
SPLIT 16 4: similar to above, but split into 4-bit components to exploit SSSE3 capabilities of the CPU
SPLIT 16 4 / ALTMAP: same as above, but memory is laid out differently to save some pack/unpack instructions in the main loop
XOR DEPENDS: algorithm I devised, detailed here. The JIT variant is a rather complex optimised version of the algorithm, which requires code to be dynamically JIT'd (probably can't do this from Go; certainly isn't platform independent (then again, the fastest platform independent method that I know of is the L/H lookup technique that par2cmdline uses))

Hope the information was useful, if you do decide to make it PAR2 compatible.

Safihre commented 7 years ago

@animetosho I was always wondering why ParPar doesn't do verification/repair? Is this a whole separate set of problems that cannot be sped up more? Multipar does it considerably faster than par2cmdline, but is limited to Windows for now. Even though he released the source, all comments are in Japanese and I just get lost :/

animetosho commented 7 years ago

Going to get a little off-topic here (sorry about that), but repair is a different set of logic to create, and much more complex. ParPar's original aim was to eventually be integrated into a usenet uploader in a way with minimal overhead (current PAR2 implementations are strongly tied to reading off disk, require at least two read passes to generate any recovery and don't support streaming). Current usenet upload processes require PAR2 generation to be done separately before the actual upload, which means overhead from an additional disk read pass; integrating a concurrent generator avoids this (so that uploading and PAR2 create can be done in a single read pass) and also allows the generated PAR2 to be directly sent over the wire, eliminating any disk writes.
As such, anything other than create is outside its scope. In fact, the command line interface itself is already out of scope as it only needs to be a library - I figured that it wasn't too hard to implement and actually could help with testing.

Of course, if someone wants to add repair capabilities, they're definitely welcome to, though I'm not sure a Javascript application is the most attractive place to do that in. Repair can't really be done in a streaming fashion either, so it would also somewhat be against one of its goals (particularly with regards to uploading). As it's written in Javascript, speed was actually never really one of its primary goals (ParPar was only meant to be comparable to fast implementations; where it is now is more of a coincidence).

But the biggest reason for my not adding support for repair would be time/effort really. Again, it's quite different to create (and more complex), so have not been willing to do something I'm not aiming for.

Multipar can be run under Wine and seems to work fine there, though I presume you're interested in bundling into a downloader, which could make things tricky.
It's still x86 only though, and I haven't seen any PAR2 utility optimised for non-x86 CPUs. ParPar includes ARM NEON support (the only tool that does, as far as I know), though I've primarily only been interested in x86 as well.

Could repair be sped up more? Taking a stab in the dark, but I'd say that you probably can't get that much faster than Multipar (on x86), particularly now that the author has implemented a number of ParPar's optimisations. I'd go out on a limb and say that there's probably some small improvements that can be done [over Multipar], but I've never really looked much into repairing to make much of a comment.

Making par2j non-Windows specific should be largely a case of replacing the Win32 API calls with POSIX equivalents, and doable by someone who's willing to put in the effort.

As for verification, I'm not entirely sure what it encompasses, but I suspect it'd mostly be bound by disk I/O.

Parchive / par2cmdline

Help to create a compatible implementation! #79