Wings ngs extension support

andmaltes commented 2 years ago

Hi, this is such a great project. I want to make something that reads wings ngs embroidery format and create an image from it. Do you have any plans of supporting this file type? If not, would you mind give me some directions so I can help doing this? thanks!

tatarize commented 2 years ago

I would need to understand the file format, which requires looking at it since I'm sort of an expert on these it's worth taking a look.

My understanding was ngs was just a typical file save format, and not an embroidery format. But, trying to save it out from my editor, it's apparently similar to .art and .emb formats. It's a Microsoft Compound File Binary file, mine contained 4 files. design.vvt iconv2.rle index.tmp stats.dat

The design.vvt file contains what we need out of this file format since it contains the design data. Typically in .emb and .art files this is a zlib compressed data stream after 4 bytes of uncompressed file size followed by either swizzled data of a zlib stream or unswizzled data. I also believe in those files the core design data itself is actually encrypted.

tatarize commented 2 years ago

Checking into the code it doesn't appear to have a zlib stream, but it could be a swizzled zlib.

Offset(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

00000000 E8 8B 00 00 7A 30 00 00 FF 44 36 F1 00 06 E0 00 è‹..z0..ÿD6ñ..à. 00000010 90 02 00 A0 1E 00 F1 06 B8 3F C0 A0 03 9C 97 14 ... ..ñ.¸?À .œ—. 00000020 00 20 04 00 1B 00 61 00 8C 01 B0 06 80 BF 54 DC . ....a.Œ.°.€¿TÜ 00000030 00 28 00 60 F4 1F 65 00 CC 01 80 B6 B6 00 40 01 .(.`ô.e.Ì.€¶¶.@. 00000040 90 06 80 DB CF 6C BB 70 09 00 03 00 44 FF 24 F7 ..€ÛÏl»p....Dÿ$÷

Here we see the size is like 0x8B8E but 7A 30 is not a valid zlib or gzip magic number you'd expect. Also the 7A 30 00 00 makes it seem like this is the number 0x307A and the rest of the file is 307E in length this might be a second length factor. So uncompressed length, compressed length. If we go 0x307A back from the end we start with 00 06 E0 00... Which would imply that 3 different 32 bit numbers are the header. But, analysis shows that while 00 and FF are the most common values there is a mix of all other characters this strongly implies either compression or encryption.

Now I ran the code through a quickie bit of code scanning for zlib streams that I wrote and found nothing. This follows another idea somebody else had that did the same thing.

import sys
import zlib

filename = sys.argv[1]

data = open(filename, 'rb').read()
not_found = True

for q in range(len(data)):
    for e in range(q+1, len(data)):
        try:
            decompress = zlib.decompress(data[q:e+1])
        except zlib.error:
            continue
        if decompress:
            file = f'data{q}.dat'
            print(f"Writing file: {file}")
            f = open(file, 'wb')
            f.write(decompress)
            not_found = False

if not_found:
    print("No zlib streams were found in this file")

This strongly implies if there is a zlib stream and I would believe there should be one that starts at the 12th byte that that zlib stream would lightly swizzled since apparently these formats all seem to share a common ancestor.

PS C:\Users\Tat\PycharmProjects\RandomScraps> python .\zlibsearcher.py .\design.vvt
No zlib streams were found in this file

Running a more advanced check where I search every point in the stream and all typical swizzles (most swizzles are a bit-shift and xor) we can run through all 2048 different swizzles pretty quickly. Since there's only 256 numbers to xor with and 8 different circular right bit shifts to apply. I have run into more complex ones where in the moshi-laser controller it has a bivariate swizzle where 1 bit decides between the two different types of swizzles. If we're just switching individual bytes we are mapping a 1:1 set of 256 characters to another set of 256 characters which is 256! which is a bit too many things to possible run through. So 256 * 8 seems reasonable, and I can probably do the same with gzip as well.

import sys
import zlib

filename = sys.argv[1]

data = open(filename, 'rb').read()
not_found = True

def perform_swizzle(data, xor, right):
    for i in range(len(data)):
        d = data[i]
        d ^= xor
        d <<= right
        m = d >> 8
        d |= m
        d &= 0xFF
        data[i] = d
    return data

for q in range(len(data)):
    for xor in range(256):
        for right in range(8):
            data1 = bytearray(data)
            perform_swizzle(data1, xor, right)
            try:
                decompress = zlib.decompress(data1[q:])
            except zlib.error:
                continue
            if decompress:
                file = f'data{q}-{xor}-{right}.dat'
                print(f"Writing file: {file}")
                f = open(file, 'wb')
                f.write(decompress)
                not_found = False

if not_found:
    print("No zlib streams were found in this file")

I could probably run this faster if I went hard with the assumption that q == 12. But, I wasn't expecting it to take this long. But, should be done completed in reasonable time...

tatarize commented 2 years ago

q 12, q 8 -- both no. Yeah I'd need to know this format fully. Maybe gzip. I'll check first 32 positions or so.

tatarize commented 2 years ago

zlib < 32, gzip < 32 are no-go for all 2048 swizzles.

andmaltes commented 2 years ago

Thanks for taking a look at this format! I am trying to understand all the info you posted to see if I can find something myself. Thanks again!

tatarize commented 2 years ago

Well, there's at least a few ways forward. Namely create some tiny files that are basically identical but you move one stitch a tiny bit. That'll usually only change the encoding of that particular stitch and thereby point out what the encoding of that particular stitch. Tweak check for hex editor change. Tweak, repeat. Pattern guess. Modify the file and see if you can do this both directions.

EmbroidePy / pyembroidery

Wings ngs extension support #148