exponential-decay / skeleton-container-test-suite-generator

Amendment to the Skeleton Test Suite, creating OLE2 and ZIP container file types.
1 stars 0 forks source link

subsequence offsets: fmt/1187 and fmt/1348 #7

Closed richardlehane closed 2 years ago

richardlehane commented 4 years ago

These signatures both have multiple patterns with offsets SubSeqMinOffset="0" SubSeqMaxOffset="1024"

The generator is incorrectly positioning sequences in the pattern. The first sequence in these patterns is put in the middle of the range (at offset 500 something). But the next sequence doesn't appear until about offset 2096 (i.e. adding the max offsets + length of the previous sequence). This is too far out (if the first sequence is at 500, should be at a max of about 1596 or something. offsets.zip

ross-spencer commented 3 years ago

This is still on my radar! (Although technically I've just reminded myself of it...)

richardlehane commented 2 years ago

Hi Ross, working on cutting a release with v100, this bug hits three more skeleton files: fmt-1502-container-signature-id-57000.psz fmt-1518-container-signature-id-22810.spp x-fmt-244-container-signature-id-4090.mpp

ross-spencer commented 2 years ago

Thanks Richard.

We may have chatted about this before, apologies if so.

Taking one example:

<Files>
    <File>
        <Path>CompObj</Path>
        <BinarySignatures>
            <InternalSignatureCollection>
                <InternalSignature ID="4090">
                    <ByteSequence Reference="BOFoffset">
                        <SubSequence Position="1" SubSeqMinOffset="40" SubSeqMaxOffset="1024">
                            <Sequence>14 00 00 00 'MSProject.DocFile.4' 00</Sequence>
                        </SubSequence>
                    </ByteSequence>
                    <ByteSequence Reference="BOFoffset">
                        <SubSequence Position="2" SubSeqMinOffset="40" SubSeqMaxOffset="1024">
                            <Sequence>16 00 00 00 'MSProject.Project.4_1' 00</Sequence>
                        </SubSequence>
                    </ByteSequence>
                </InternalSignature>
            </InternalSignatureCollection>
        </BinarySignatures>
    </File>
</Files>

Do you think it matters too much how the generator disambiguates those? Should it just try and fit the two sequences between bytes 40 and 1024?

Part of me wonders why this isn't an issue in how the signature is described (without seeking back to 40, sequence two must always appear after sequence 1, but within 1024 bytes?), but that seems not to be the case?

richardlehane commented 2 years ago

I don't think just fitting them both between bytes 40 and 1024 would necessarily work, since the second sequence needs to be at least 40 bytes from the first.

I think the rule for your example is:

ross-spencer commented 2 years ago

Yup, that makes sense! Although, bear with me, do you know if the generator can handle the following workaround, and a secondary question, does it have the same intent as you describe?

(off the top of my head):

<Files>
    <File>
        <Path>CompObj</Path>
        <BinarySignatures>
            <InternalSignatureCollection>
                <InternalSignature ID="4090">
                    <ByteSequence Reference="BOFoffset">
                        <SubSequence Position="1" SubSeqMinOffset="40" SubSeqMaxOffset="1024">
                            <Sequence>14 00 00 00 'MSProject.DocFile.4' 00 {40-996} 16 00 00 00 'MSProject.Project.4_1' 00</Sequence>
                        </SubSequence>
                    </ByteSequence>
                </InternalSignature>
            </InternalSignatureCollection>
        </BinarySignatures>
    </File>
</Files>

If so, it could be a temporary solution for v100. Better than that, that's one way I can pre-process the signature and put it back through the generator functions.

richardlehane commented 2 years ago

That looks like it should work to me. But rather than 996 that range could be the full 1024?

I've added you to the "builder" repo &you can test it there if you like (may save you setting up the full environment needed for building the suite). To test there just make those edits to the container file & add a tag e.g. v100-testing, then push those up and the github action should take care of the rest

ross-spencer commented 2 years ago

I've added you to the "builder" repo

Thanks for that Richard - looking forward to giving it a go. I've put a few commits against my code tonight, but will need to take a bit more time to chip away with the fix itself. I'm happy there's already logging detecting the issue, so I think I can write something within that pretty easily. But let's see.

That looks like it should work to me. But rather than 996 that range could be the full 1024?

Most likely! I keep getting stuck with a visual of a 1024 byte frame in my head for both sequences to fit into.

ross-spencer commented 2 years ago

It looks like a fix is coming together pretty quickly, I'm just going to neaten some things up and have a commit for review shortly.

I am having an issue with fmt-1190-container-signature-id-28200.swc. @richardlehane can you take a look and see if it:

a) is okay in Siegfried, b) if not, whether the sequences look incorrect?

My reading is that it is being written okay. It's odd too that it's the only one not writing out correctly.

fmt-1190-container-signature-id-28200.zip

ross-spencer commented 2 years ago

This commit should fix the issue, with the exception of .swc all look good :crossed_fingers:

Will clean up some other small pieces of detritus and work to merge this.

ross-spencer commented 2 years ago

I won't merge tonight, and I haven't had a chance to try builder but dev/issue-7-offset-overlap-issues is where it's all at. I probably won't touch that again before merging. There's a tranche of work still needed to modernize this I'll do separately. Will rebase and merge in a few days depending on feedback and any further analysis I can do on `.swc'.

richardlehane commented 2 years ago

.swc looks like an issue with siefried's parser: image

Do you see how the first sequence is anchored to "P" and the second to "B"? It should be the reverse, P stands for previous and B is BOF. Your file should match but it doesn't because sf is testing the offset range against the start of file rather than relative to the previous match.

Another notch on the belt of the skeleton suite!

I'm nearing a release and will make sure this gets fixed as it is a biggy. Let me know if there's anything you want included in the next release too.

richardlehane commented 2 years ago

this issue can be traced to Position="0" which occurs 6 times in the container signature file

ross-spencer commented 2 years ago

Thanks @richardlehane I appreciate you checking.

It looks like file has caught out DROID too which is where I was doubting myself.

image

image

I'll need to report an issue for them too.

this issue can be traced to Position="0" which occurs 6 times in the container signature file

This is happening because the majority of signatures don't use zero-based indexing? That would make sense that it has the potential to cause some difficulty.

richardlehane commented 2 years ago

Glad sf isn't alone in this, suggests the real problem is the container file itself and that the first byte sequence really should have Position="1" (as a comparison, Position="0" doesn't appear at all in the droid signature file).

This error may not be significant for all 6 occurrences but it matters for fmt/1190 because there are two bytesequences in this signature. The second bytesequence has Position="1", which is the real problem as there is an implicit ordering there that isn't being picked up because any bytesequences with "Position=1" are interpreted as being relative to the BOF/EOF rather than to a preceding bytesequence.

ross-spencer commented 2 years ago

Okay, this should be fixed now, the actual fix is https://github.com/exponential-decay/skeleton-container-test-suite-generator/pull/14/commits/67c49cf192939b814ed3d9d7de1127afe1c57d0d and there are a bunch of other aesthetic and code-quality improvements captured in later commits. Let me know if there are any problems @richardlehane and I can look asap. Closing this one for now as it all seems good, but again, we can come back to it if need be.