Open dgbogi564 opened 5 months ago
I tried experimenting with various codepages, but none seem to consistently work with this set of lnk files. The best results I've gotten have been through utf-8:
F:\???????????? ?? (@IndigoBeatss)
According to liblnk's documentation, unicode strings are stored as utf-16le, but when I try it I get results like this:
㩆捜汯敬瑣潩屮汤屳楨潴業搭屬楨潴業摟睯汮慯敤彤睴瑩整屲㼿㼿㼿㼿㼿㼿㼠‿䀨湉楤潧敂瑡獳�
Hi, I check file from the first ZIP and I think the parsing is correct. You are right that Unicode strings are stored in UTF-16 but in this case LinkInfo part does not contain LocalBasePathUnicode
. The failing one is LocalBasePath
. I think you have to provide the correct codepage (I dont know which one). Down below you can see the raw data:
# Local base path: F:\collection\dls\hitomi-dl\hitomi_downloaded_twitter\笹���も (@sasakomitomo)
>>> self.info._raw[ self.info.local_base_path_offset() : self.info.size() ]
b'F:\\collection\\dls\\hitomi-dl\\hitomi_downloaded_twitter\\\xe7\xac\xb9\xe5\x8f\xa4\xe3\x81\xbf\xe3\x81\xa8\xe3\x82\x82 (@sasakomitomo)\x00\x00'
# Relative path: ..\..\..\collection\dls\hitomi-dl\hitomi_downloaded_twitter\笹古みとも (@sasakomitomo)
>>> self.string_data._raw[2:164]
b'.\x00.\x00\\\x00.\x00.\x00\\\x00.\x00.\x00\\\x00c\x00o\x00l\x00l\x00e\x00c\x00t\x00i\x00o\x00n\x00\\\x00d\x00l\x00s\x00\\\x00h\x00i\x00t\x00o\x00m\x00i\x00-\x00d\x00l\x00\\\x00h\x00i\x00t\x00o\x00m\x00i\x00_\x00d\x00o\x00w\x00n\x00l\x00o\x00a\x00d\x00e\x00d\x00_\x00t\x00w\x00i\x00t\x00t\x00e\x00r\x00\\\x009{\xe4S\x7f0h0\x820 \x00(\x00@\x00s\x00a\x00s\x00a\x00k\x00o\x00m\x00i\x00t\x00o\x00m\x00o\x00)\x00'
# Value: F:\collection\dls\hitomi-dl\hitomi_downloaded_twitter\笹古みとも (@sasakomitomo)
>>> [e for e in self.extras][1]._raw[103:249]
b'F\x00:\x00\\\x00c\x00o\x00l\x00l\x00e\x00c\x00t\x00i\x00o\x00n\x00\\\x00d\x00l\x00s\x00\\\x00h\x00i\x00t\x00o\x00m\x00i\x00-\x00d\x00l\x00)\x00\x00\x00\x00\x00\x00\x00\x82\x00\x00\x001SPS0\xf1%\xb7\xefG\x1a\x10\xa5\xf1\x02`\x8c\x9e\xeb\xac=\x00\x00\x00\n\x00\x00\x00\x00\x1f\x00\x00\x00\x16\x00\x00\x009{\xe4S\x7f0h0\x820 \x00(\x00@\x00s\x00a\x00s\x00a\x00k\x00o\x00m\x00i\x00t\x00o\x00m\x00o\x00)\x00\x00'
Here is complete output:
Windows Shortcut Information:
Link CLSID: 00021401-0000-0000-C000-000000000046
Link Flags: HasTargetIDList | HasLinkInfo | HasRelativePath | IsUnicode | EnableTargetMetadata - (524427)
File Flags: FILE_ATTRIBUTE_DIRECTORY - (16)
Creation Timestamp: 2023-05-27 01:51:30.558322+00:00
Modified Timestamp: 2023-10-15 07:15:51.716984+00:00
Accessed Timestamp: 2023-10-18 00:31:36.907479+00:00
Icon Index: 0
Window Style: SW_SHOWNORMAL
HotKey: UNSET - UNSET {0x0000}
TARGETS:
Index: 78
ITEMS:
Root Folder
Sort index: Internet Explorer
Guid: B710002F-F5A6-0019-2F46-3A5C00000000
File entry
Flags: Is directory
Modification time: None
File attribute flags: 16
Primary name: collection
File entry
Flags: Is directory
Modification time: None
File attribute flags: 16
Primary name: dls
File entry
Flags: Is directory
Modification time: None
File attribute flags: 16
Primary name: hitomi-dl
File entry
Flags: Is directory
Modification time: None
File attribute flags: 16
Primary name: hitomi_downloaded_twitter
File entry
LINK INFO:
Link info flags: 1
Local base path: F:\collection\dls\hitomi-dl\hitomi_downloaded_twitter\笹���も (@sasakomitomo)
Common path suffix:
LOCAL:
Drive type: 3
Drive serial number: 0x280e8914
Drive type: DRIVE_FIXED
Volume label:
DATA
Relative path: ..\..\..\collection\dls\hitomi-dl\hitomi_downloaded_twitter\笹古みとも (@sasakomitomo)
EXTRA BLOCKS:
DISTRIBUTED_LINK_TRACKER_BLOCK
Length: 88
Version: 0
Machine identifier: trigun
Droid volume identifier: 056C316C-9975-46CC-A56F-3CF14020D396
Droid file identifier: D8F800EC-6D46-11EE-AE84-5E99434149B5
Birth droid volume identifier: 056C316C-9975-46CC-A56F-3CF14020D396
Birth droid file identifier: D8F800EC-6D46-11EE-AE84-5E99434149B5
METADATA_PROPERTIES_BLOCK
Property store:
Storage:
Version: 0x53505331
Format id: DABD30ED-0043-4789-A7F8-D013A4736622
Serialized property values:
Property:
Id: 100
Value: hitomi_downloaded_twitter (F:\collection\dls\hitomi-dl)
Value type: VT_LPWSTR
Storage:
Version: 0x53505331
Format id: B725F130-47EF-101A-A5F1-02608C9EEBAC
Serialized property values:
Property:
Id: 10
Value: 笹古みとも (@sasakomitomo)
Value type: VT_LPWSTR
Property:
Id: 4
Value: File folder
Value type: VT_LPWSTR
Storage:
Version: 0x53505331
Format id: 28636AA6-953D-11D2-B5D6-00C04FD918D0
Serialized property values:
Property:
Id: 30
Value: F:\collection\dls\hitomi-dl\hitomi_downloaded_twitter\笹古みとも (@sasakomitomo)
Value type: VT_LPWSTR
Storage:
Version: 0x53505331
Format id: 446D16B1-8DAD-4870-A748-402EA43D788C
Serialized property values:
Property:
Id: 104
Value: None
Value type: VT_CLSID
I've tried every codepage identifier listed here but none seem to work...
For reference, chcp
in command prompt returns Active code page: 437
and [System.Text.Encoding]::Default
in powershell returns:
IsSingleByte : True
BodyName : iso-8859-1
EncodingName : Western European (Windows)
HeaderName : Windows-1252
WebName : Windows-1252
WindowsCodePage : 1252
IsBrowserDisplay : True
IsBrowserSave : True
IsMailNewsDisplay : True
IsMailNewsSave : True
EncoderFallback : System.Text.InternalEncoderBestFitFallback
DecoderFallback : System.Text.InternalDecoderBestFitFallback
IsReadOnly : True
CodePage : 1252
Hi, sorry for the longer inactivity.
1) You have actually found the correct codepage - UTF-8. It works for file in the first ZIP. I can add UTF-8 encding as a pre-fallback when decoding with the given codepage fails (the last fallback will remain the same - use the given codepage but repace unknown characters).
$ lnkparse 笹古みとも\ \(@sasakomitomo\).lnk -c utf-8
...
LINK INFO:
Link info flags: 1
Local base path: F:\collection\dls\hitomi-dl\hitomi_downloaded_twitter\笹古みとも (@sasakomitomo)
Common path suffix:
LOCAL:
Drive type: 3
Drive serial number: 0x280e8914
Drive type: DRIVE_FIXED
Volume label:
DATA
Relative path: ..\..\..\collection\dls\hitomi-dl\hitomi_downloaded_twitter\笹古みとも (@sasakomitomo)
...
2) Regarding files in the second ZIP, the question marks are actually stored in the binary itself, so there is nothing wrong with the decoding process I think
$ cat lnk.files/𝙄𝙣𝙙𝙞𝙜𝙤\ 🦊\ \(@IndigoBeatss\).lnk
LF
8QU//F:\tY^Hg3(w,/J>V
h`1collectionF .collectionJ1dls8 .dls\1hitomi-dlD .hitomi-dl1hitomi_downloaded_twitterd .hitomi_downloaded_twitter(55D5c5Y5^5\5d >؊ (@IndigoBeatss)p .5D5c5Y5^5\5d >؊ (@IndigoBeatss)N<$$58:(F:\collection\dls\hitomi-dl\hitomi_downloaded_twitter\???????????? ?? (@IndigoBeatss)F:\collection\dls\hitomi-dl\hitomi_downloaded_twitter\5D5c5Y5^5\5d >؊ (@IndigoBeatss)[..\..\..\collection\dls\hitomi-dl\hitomi_downloaded_twitter\5D5c5Y5^5\5d >؊ (@IndigoBeatss)`Xdesktop-l0vtu6nl1luFo<@ Ӗ*{l1luFo<@ Ӗ*{Q 1SPS0CGsf"d8hitomi_downloaded_twitter (F:\collection\dls\hitomi-dl)1SPS0%G`Q
5D5c5Y5^5\5d >؊ (@IndigoBeatss))
File folder1SPSjc(=OнVF:\collection\dls\hitomi-dl\hitomi_downloaded_twitter\5D5c5Y5^5\5d >؊ (@IndigoBeatss)91SPSmDpHH@.=xhH
pJrHfֽi%
It may be a difference between Windows/WSL and Linux, but for the first example, lnk_file.info.local_base_path()
works for me, while lnk_file.info.local_base_path_unicode()
works for all the files in the second zip.
You could try to use the following? It would be good enough on my Linux, but you would need to validate on Windows.
def get_target(lnk_path, code_page):
with open(lnk_path, "rb") as indata:
lnk_file = LnkParse3.lnk_file(indata, cp=code_page)
return lnk_file.info.local_base_path_unicode() or lnk_file.info.local_base_path()
Looking at your big test script to test all encodings, I can see that you are looping on your list with
for identifier, net_name, additional_info in code_page_identifiers:
print(f'{identifier}, {net_name}, {additional_info}')
try:
print(f'{get_target(lnk_path, identifier)}\n')
Could you try with
print(f'{get_target(lnk_path, net_name)}\n') # <- Modified for net_name
Looking at UTF-8 (the last entry in your list), you end up using lnk_file = LnkParse3.lnk_file(indata, cp="65001")
instead of lnk_file = LnkParse3.lnk_file(indata, cp="utf-8")
. This causes lnk.info.local_base_path()
to error out with LookupError: unknown encoding: 65001
, as per your output. If you used "utf-8", you should have the question marks. From my tests, it looks like that both "utf-8" or "65001" would give your the right value with lnk_file.info.local_base_path_unicode()
.
Hopefully that will help a little! 🙂
With this .lnk file (inside zip) as input, when I run this code:
I receive this error (path :