Matmaus / LnkParse3

Windows Shortcut file (LNK) parser
MIT License
63 stars 13 forks source link

testing: Test print_json for all samples #2

Closed ernix closed 3 years ago

ernix commented 3 years ago

This patch provides a unit-test script that runs print_json() method

To run test: $ python -m unittest discover tests

See also: https://github.com/Matmaus/LnkParse3/pull/1

ernix commented 3 years ago

I have no clue how important dumping LNK structure to a JSON, as they are not comparable in many ways due to its extraordinary "flexibility". But trying to subtract specific parts (i.e: modification_time) from everything is the main problem while parsing lnk_failing*.

So, do we need to keep get_all option in the future? It seems impossible to define all metadata that LNK files can have.

ernix commented 3 years ago

I have a problem with failing tests due to differences in (probably) all date attributes. Don't you know why?

Well, that's because the module doesn't care Time zones. https://github.com/Matmaus/LnkParse3/pull/1/commits/92d21ea4aa69afc7a47409ee6458743a879ecf32

According to [MS-SHLLINK], FILETIME and DOS data time are stored as UTC, datetime.datetime will create a naive datetime object without tzinfo. My Time zone is JST(UTC+0900) and I guess yours is UTC+0200.

https://winprotocoldoc.blob.core.windows.net/productionwindowsarchives/MS-SHLLINK/%5bMS-SHLLINK%5d.pdf

WriteTime (8 bytes): A FILETIME structure ([MS-DTYP] section 2.3.3) that specifies the write time of the link target in UTC (Coordinated Universal Time). If the value is zero, there is no write time set on the link target.

https://docs.python.org/3.8/library/datetime.html

Date and time objects may be categorized as “aware” or “naive” depending on whether or not they include timezone information.

Matmaus commented 3 years ago

Thank you for the explanation.

So, do you prefer to use datetime and ignore invalid dates (in pr #1)?

ernix commented 3 years ago

So, do you prefer to use datetime and ignore invalid dates (in pr #1)?

Yes, I do. I think both of print_json() and print_lnk_file() methods should print like 2008-09-12 20:27:17.101000+00:00 in any case, neither 2008-09-13 05:27:17 nor 2008-09-12 22:27:17. For invalid date or corrupted data, I'd expect this module to raise some exceptions. Even If print_json() allow invalid date as null, It's better to provide some warnings in STDERR.

Also, consider using .in and .out files for test samples. The expected result can be stored in a new file.

I'd like to avoid comparing JSON strings, because print_json() doesn't sort keys, trailing commas are also not allowed in JSON. That's why I use assertDictEqual.

ernix commented 3 years ago

Speaking of testing, you can change time zone to reproduce my environment. in bash/zsh:

$ TZ="Asia/Tokyo" python -m unittest discover tests

This of course is wrong, and at least every datetime objects should have timezone info internally.

Matmaus commented 3 years ago

Yes, I do. I think both of print_json() and print_lnk_file() methods should print like 2008-09-12 20:27:17.101000+00:00 in any case, neither 2008-09-13 05:27:17 nor 2008-09-12 22:27:17.

Ok, I agree.

For invalid date or corrupted data, I'd expect this module to raise some exceptions. Even If print_json() allow invalid date as null, It's better to provide some warnings in STDERR.

I have no problem with errors or warnings in STDERR :+1: . Just do not fail.

I'd like to avoid comparing JSON strings, because print_json() doesn't sort keys, trailing commas are also not allowed in JSON. That's why I use assertDictEqual.

I understand. Comparing JSON strings is not what I mean. You can store the expected result as a JSON file and then load it with json.loads (as you already do with stdout). The only difference is that the code will be much shorter.

Matmaus commented 3 years ago

Speaking of testing, you can change time zone to reproduce my environment. in bash/zsh:

$ TZ="Asia/Tokyo" python -m unittest discover tests

Test has passed.

ernix commented 3 years ago

I understand. Comparing JSON strings is not what I mean. You can store the expected result as a JSON file and then load it with json.loads (as you already do with stdout). The only difference is that the code will be much shorter.

OK, let me get them

ernix commented 3 years ago

As you can see, there are 4 files are failing due to modification_time issue. So the first task we need to tackle is one of:

$ TZ="Asia/Tokyo" python -m unittest discover tests

======================================================================
ERROR: test_json (test_samples.TestSamples) [lnk_failing2.lnk]
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/kojima/Program/LnkParse3/tests/test_samples.py", line 25, in test_json
    lnk.print_json()
  File "/Users/kojima/Program/LnkParse3/LnkParse3/__init__.py", line 1471, in print_json
    res = self.get_json(print_all)
  File "/Users/kojima/Program/LnkParse3/LnkParse3/__init__.py", line 1506, in get_json
    item.pop('modification_time', None)
AttributeError: 'NoneType' object has no attribute 'pop'

======================================================================
ERROR: test_json (test_samples.TestSamples) [lnk_failing3.lnk]
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/kojima/Program/LnkParse3/tests/test_samples.py", line 25, in test_json
    lnk.print_json()
  File "/Users/kojima/Program/LnkParse3/LnkParse3/__init__.py", line 1471, in print_json
    res = self.get_json(print_all)
  File "/Users/kojima/Program/LnkParse3/LnkParse3/__init__.py", line 1506, in get_json
    item.pop('modification_time', None)
AttributeError: 'NoneType' object has no attribute 'pop'

======================================================================
ERROR: test_json (test_samples.TestSamples) [lnk_failing4.lnk]
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/kojima/Program/LnkParse3/tests/test_samples.py", line 25, in test_json
    lnk.print_json()
  File "/Users/kojima/Program/LnkParse3/LnkParse3/__init__.py", line 1471, in print_json
    res = self.get_json(print_all)
  File "/Users/kojima/Program/LnkParse3/LnkParse3/__init__.py", line 1506, in get_json
    item.pop('modification_time', None)
AttributeError: 'NoneType' object has no attribute 'pop'

======================================================================
ERROR: test_json (test_samples.TestSamples) [lnk_failing.lnk]
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/kojima/Program/LnkParse3/tests/test_samples.py", line 25, in test_json
    lnk.print_json()
  File "/Users/kojima/Program/LnkParse3/LnkParse3/__init__.py", line 1471, in print_json
    res = self.get_json(print_all)
  File "/Users/kojima/Program/LnkParse3/LnkParse3/__init__.py", line 1506, in get_json
    item.pop('modification_time', None)
AttributeError: 'NoneType' object has no attribute 'pop'

----------------------------------------------------------------------
Ran 1 test in 0.008s

FAILED (errors=4)
Matmaus commented 3 years ago
  1. I would just skip null items and do not even store them in the target items, see "To solve failing samples...".
  2. We can either use datetime like in the refactored version or leave it be in this PR (test will pass with TZ=..).
ernix commented 3 years ago

I would just skip null items and do not even store them in the target items, see "To solve failing samples...".

I've just skipped NoneType items while print_json() in 5527db4, please adjust tests/json/*.json as you want before changing code.

We can either use datetime like in the refactored version or leave it be in this PR (test will pass with TZ=..).

b02abf7 is a revised version of 92d21ea. It will create ISO 8601 format datetime. 👍

ernix commented 3 years ago

Why did you choose to skip None values at the latest stage during printing? I think it would be better to do not even add None values into the result as they have no informative value. Do you have any reason?

https://github.com/Matmaus/LnkParse3/pull/2#pullrequestreview-492013501

To solve failing samples, I would go with skipping unimplemented things, like:

else: # Lines 717 and 720 in LnkParse3.__init__.py
  if target:
      self.targets['items'].append(target)

Because your one-line patch above can not rule out the possibility of NoneType to be in self.targets['items']. It's a nested try ~ catch block and the outer try block append target too. This nested try block is just a slipshod work to determine shell item class types and should be more self-explanatory.

See also shell item classes and my refactoring: bc22a36b366dad8bac06b138d3d23b0666938a0d

                try:
                    target = self.SHELL_ITEM_CLASSESS[item_type](index + 2, size)
                except:
                    try:
                        target = self.SHELL_ITEM_CLASSESS[item_type & 0x70](index + 2, size)
                    except:
                        pass
                    else:
                        self.targets['items'].append(target)  # You covered this
                else:
                    self.targets['items'].append(target)  # But not for this one

In contrast, the error message from parsing lnk_failing* is quite obvious and using pop on NoneType simply is an abnormal. I only minimized the impact of changing code for the unit testing.

You can also skip NoneType while appending in another P-r/commit, but it changes the whole data structure that we have to aware of. This thread is for unit testing, and it will keep track of them for us.

Matmaus commented 3 years ago

I had in mind both blocks (see the comment else: # Lines 717 and 720 in LnkParse3.__init__.py) but maybe it was not so clear. Never mind. You are right, we may leave this as is in this PR.

I can merge this if you are done :+1: