armijnhemel / binaryanalysis-ng

Binary Analysis Next Generation (BANG)
GNU Affero General Public License v3.0
461 stars 66 forks source link

Strings in .jar are splitted unexpectedly #372

Closed chimelab closed 5 months ago

chimelab commented 5 months ago

The problem was seen in parsing jar packages. For example, in gson (by google), strings in JsonParser.class should like below: strings gson-2.2.2/root/_rel/com/google/gson/JsonParser.class ......... $Did not consume the entire document. ......... Failed parsing JSON source: to Json java/lang/OutOfMemoryError com/google/gson/JsonParser However, BANG gives below result. This takes trouble to use them. '00d0e1af346abe733d8ba8f6f88ca203c33eb7857f8aaea78f67b7d36d7f714b': {'metadata': {'hashes': {'sha256': '00d0e1af346abe733d8ba8f6f88ca203c33eb7857f8aaea78f67b7d36d7f714b', 'md5': '1f85c49d4138a42466ac6d964cfd2db5', 'sha1': '61b79592df19ca3e3464e4adcd39908fc9f75877',}, 'strings': ['Did ' 'not ' 'consume ' 'the ' 'entire ' 'document.', 'Failed ' 'parsing ' 'JSON ' 'source: ', ' ' 'to ' 'Json'], 'classname': 'com/google/gson/JsonParser', 'filepath': PosixPath('root/_rel/com/google/gson/JsonParser.class')},

armijnhemel commented 5 months ago

Do you have the corresponding source code so I can compare?

armijnhemel commented 5 months ago

Do you have the corresponding source code so I can compare?

Never mind:

https://github.com/google/gson/blob/main/gson/src/main/java/com/google/gson/JsonParser.java

armijnhemel commented 5 months ago

The problem was seen in parsing jar packages. For example, in gson (by google), strings in JsonParser.class should like below: strings gson-2.2.2/root/_rel/com/google/gson/JsonParser.class ......... $Did not consume the entire document. ......... Failed parsing JSON source: to Json java/lang/OutOfMemoryError com/google/gson/JsonParser However, BANG gives below result. This takes trouble to use them. '00d0e1af346abe733d8ba8f6f88ca203c33eb7857f8aaea78f67b7d36d7f714b': {'metadata': {'hashes': {'sha256': '00d0e1af346abe733d8ba8f6f88ca203c33eb7857f8aaea78f67b7d36d7f714b', 'md5': '1f85c49d4138a42466ac6d964cfd2db5', 'sha1': '61b79592df19ca3e3464e4adcd39908fc9f75877',}, 'strings': ['Did ' 'not ' 'consume ' 'the ' 'entire ' 'document.', 'Failed ' 'parsing ' 'JSON ' 'source: ', ' ' 'to ' 'Json'], 'classname': 'com/google/gson/JsonParser', 'filepath': PosixPath('root/_rel/com/google/gson/JsonParser.class')},

The Java class file format stores strings with meta information indicating the type. Using strings will give a different result. However, it seems that in this case there seems to be an issue. I will investigate.

armijnhemel commented 5 months ago

I cannot reproduce the problem. With the latest BANG I am seeing:

{'metadata': {'hashes': {'sha256': '00d0e1af346abe733d8ba8f6f88ca203c33eb7857f8aaea78f67b7d36d7f714b', 'md5': '1f85c49d4138a42466ac6d964cfd2db5', 'sha1': '61b79592df19ca3e3464e4adcd39908fc9f75877', 'tlsh': 'T155514087F01095C7F45FE97E19640B5479F098382317B911CF03885A67EBA55DD6A1F0'}, 'flags': {'public': True, 'final': True, 'super': True, 'interface': False, 'abstract': False, 'synthetic': False, 'annotation': False, 'enum': False, 'module': False}, 'strings': ['Did not consume the entire document.', 'Failed parsing JSON source: ', ' to Json'], 'classname': 'com/google/gson/JsonParser', 'fields': [], 'methods': ['<init>', 'parse', 'parse', 'parse'], 'sourcefile': 'JsonParser.java'}, 'unpack_parser': 'javaclass', 'size': 2594, 'labels': ['java class']}

Which Git commit are you on?

chimelab commented 5 months ago

I will try the latest code from Git.

chimelab commented 5 months ago

It's not a bug. I missed commas/spaces between those words. Pprint.pp are used to print those .pkl information to a text file. Then strings in multiple lines are printed in lines. For example, 'JsonReader is closed' was converted to: 'JsonReader ' 'is ' 'closed', However there's no commas between the above words. That means it's still a 3-word string, but not 3 strings. It's strange why it outputs like this.

Sorry for any inconvenience.