ZhangJun2017 / QQChatHistoryExporter

导出手机QQ聊天记录为网页
MIT License
35 stars 4 forks source link

msgtype -2007 表情包表情 的解码 #5

Closed lqzhgood closed 2 years ago

lqzhgood commented 2 years ago

https://github.com/ZhangJun2017/QQChatHistoryExporter/issues/4 的方式进行解码

可以看到 key value 已经非常明显了,但是还是需要通过重写 Class 结构 来匹配进行解码,Java 代码无能为力了。

java -jar SerializationDumper-v1.13.jar

STREAM_MAGIC - 0xac ed
STREAM_VERSION - 0x00 05
Contents
  TC_OBJECT - 0x73
    TC_CLASSDESC - 0x72
      className
        Length - 41 - 0x00 29
        Value - com.tencent.mobileqq.data.MarkFaceMessage - 0x636f6d2e74656e63656e742e6d6f62696c6571712e646174612e4d61726b466163654d657373616765
      serialVersionUID - 0x00 00 00 00 00 01 8f 4e
      newHandle 0x00 7e 00 00
      classDescFlags - 0x02 - SC_SERIALIZABLE
      fieldCount - 14 - 0x00 0e
      Fields
        0:
          Int - I - 0x49
          fieldName
            Length - 9 - 0x00 09
            Value - cFaceInfo - 0x6346616365496e666f
        1:
          Int - I - 0x49
          fieldName
            Length - 8 - 0x00 08
            Value - cSubType - 0x6353756254797065
        2:
          Int - I - 0x49
          fieldName
            Length - 13 - 0x00 0d
            Value - dwMSGItemType - 0x64774d53474974656d54797065
        3:
          Int - I - 0x49
          fieldName
            Length - 7 - 0x00 07
            Value - dwTabID - 0x64775461624944
        4:
          Int - I - 0x49
          fieldName
            Length - 11 - 0x00 0b
            Value - imageHeight - 0x696d616765486569676874
        5:
          Int - I - 0x49
          fieldName
            Length - 10 - 0x00 0a
            Value - imageWidth - 0x696d6167655769647468
        6:
          Long - L - 0x4a
          fieldName
            Length - 5 - 0x00 05
            Value - index - 0x696e646578
        7:
          Int - I - 0x49
          fieldName
            Length - 9 - 0x00 09
            Value - mediaType - 0x6d6564696154797065
        8:
          Int - I - 0x49
          fieldName
            Length - 5 - 0x00 05
            Value - wSize - 0x7753697a65
        9:
          Object - L - 0x4c
          fieldName
            Length - 8 - 0x00 08
            Value - faceName - 0x666163654e616d65
          className1
            TC_STRING - 0x74
              newHandle 0x00 7e 00 01
              Length - 18 - 0x00 12
              Value - Ljava/lang/String; - 0x4c6a6176612f6c616e672f537472696e673b
        10:
          Array - [ - 0x5b
          fieldName
            Length - 11 - 0x00 0b
            Value - mobileparam - 0x6d6f62696c65706172616d
          className1
            TC_STRING - 0x74
              newHandle 0x00 7e 00 02
              Length - 2 - 0x00 02
              Value - [B - 0x5b42
        11:
          Array - [ - 0x5b
          fieldName
            Length - 8 - 0x00 08
            Value - resvAttr - 0x7265737641747472
          className1
            TC_REFERENCE - 0x71
              Handle - 8257538 - 0x00 7e 00 02
        12:
          Array - [ - 0x5b
          fieldName
            Length - 6 - 0x00 06
            Value - sbfKey - 0x7362664b6579
          className1
            TC_REFERENCE - 0x71
              Handle - 8257538 - 0x00 7e 00 02
        13:
          Array - [ - 0x5b
          fieldName
            Length - 6 - 0x00 06
            Value - sbufID - 0x736275664944
          className1
            TC_REFERENCE - 0x71
              Handle - 8257538 - 0x00 7e 00 02
      classAnnotations
        TC_ENDBLOCKDATA - 0x78
      superClassDesc
        TC_NULL - 0x70
    newHandle 0x00 7e 00 03
    classdata
      com.tencent.mobileqq.data.MarkFaceMessage
        values
          cFaceInfo
            (int)1 - 0x00 00 00 01
          cSubType
            (int)3 - 0x00 00 00 03
          dwMSGItemType
            (int)6 - 0x00 00 00 06
          dwTabID
            (int)200811 - 0x00 03 10 6b
          imageHeight
            (int)200 - 0x00 00 00 c8
          imageWidth
            (int)200 - 0x00 00 00 c8
          index
            (long)0 - 0x00 00 00 00 00 00 00 00
          mediaType
            (int)0 - 0x00 00 00 00
          wSize
            (int)37 - 0x00 00 00 25
          faceName
            (object)
              TC_STRING - 0x74
                newHandle 0x00 7e 00 04
                Length - 3 - 0x00 03
                Value - ??? - 0xe795a5
          mobileparam
            (array)
              TC_ARRAY - 0x75
                TC_CLASSDESC - 0x72
                  className
                    Length - 2 - 0x00 02
                    Value - [B - 0x5b42
                  serialVersionUID - 0xac f3 17 f8 06 08 54 e0
                  newHandle 0x00 7e 00 05
                  classDescFlags - 0x02 - SC_SERIALIZABLE
                  fieldCount - 0 - 0x00 00
                  classAnnotations
                    TC_ENDBLOCKDATA - 0x78
                  superClassDesc
                    TC_NULL - 0x70
                newHandle 0x00 7e 00 06
                Array size - 0 - 0x00 00 00 00
                Values
          resvAttr
            (array)
              TC_ARRAY - 0x75
                TC_REFERENCE - 0x71
                  Handle - 8257541 - 0x00 7e 00 05
                newHandle 0x00 7e 00 07
                Array size - 18 - 0x00 00 00 12
                Values
                  Index 0:
                    (byte)10 - 0x0a
                  Index 1:
                    (byte)6 - 0x06
                  Index 2:
                    (byte)8 - 0x08
                  Index 3:
                    (byte)-84 - 0xac
                  Index 4:
                    (byte)2 - 0x02
                  Index 5:
                    (byte)16 - 0x10
                  Index 6:
                    (byte)-84 - 0xac
                  Index 7:
                    (byte)2 - 0x02
                  Index 8:
                    (byte)10 - 0x0a
                  Index 9:
                    (byte)6 - 0x06
                  Index 10:
                    (byte)8 - 0x08
                  Index 11:
                    (byte)-56 - 0xc8
                  Index 12:
                    (byte)1 - 0x01
                  Index 13:
                    (byte)16 - 0x10
                  Index 14:
                    (byte)-56 - 0xc8
                  Index 15:
                    (byte)1 - 0x01
                  Index 16:
                    (byte)64 (ASCII: @) - 0x40
                  Index 17:
                    (byte)1 - 0x01
          sbfKey
            (array)
              TC_ARRAY - 0x75
                TC_REFERENCE - 0x71
                  Handle - 8257541 - 0x00 7e 00 05
                newHandle 0x00 7e 00 08
                Array size - 16 - 0x00 00 00 10
                Values
                  Index 0:
                    (byte)101 (ASCII: e) - 0x65
                  Index 1:
                    (byte)50 (ASCII: 2) - 0x32
                  Index 2:
                    (byte)51 (ASCII: 3) - 0x33
                  Index 3:
                    (byte)50 (ASCII: 2) - 0x32
                  Index 4:
                    (byte)57 (ASCII: 9) - 0x39
                  Index 5:
                    (byte)55 (ASCII: 7) - 0x37
                  Index 6:
                    (byte)97 (ASCII: a) - 0x61
                  Index 7:
                    (byte)99 (ASCII: c) - 0x63
                  Index 8:
                    (byte)97 (ASCII: a) - 0x61
                  Index 9:
                    (byte)54 (ASCII: 6) - 0x36
                  Index 10:
                    (byte)54 (ASCII: 6) - 0x36
                  Index 11:
                    (byte)52 (ASCII: 4) - 0x34
                  Index 12:
                    (byte)52 (ASCII: 4) - 0x34
                  Index 13:
                    (byte)48 (ASCII: 0) - 0x30
                  Index 14:
                    (byte)55 (ASCII: 7) - 0x37
                  Index 15:
                    (byte)50 (ASCII: 2) - 0x32
          sbufID
            (array)
              TC_ARRAY - 0x75
                TC_REFERENCE - 0x71
                  Handle - 8257541 - 0x00 7e 00 05
                newHandle 0x00 7e 00 09
                Array size - 16 - 0x00 00 00 10
                Values
                  Index 0:
                    (byte)-10 - 0xf6
                  Index 1:
                    (byte)-1 - 0xff
                  Index 2:
                    (byte)-121 - 0x87
                  Index 3:
                    (byte)0 - 0x00
                  Index 4:
                    (byte)82 (ASCII: R) - 0x52
                  Index 5:
                    (byte)1 - 0x01
                  Index 6:
                    (byte)103 (ASCII: g) - 0x67
                  Index 7:
                    (byte)24 - 0x18
                  Index 8:
                    (byte)27 - 0x1b
                  Index 9:
                    (byte)113 (ASCII: q) - 0x71
                  Index 10:
                    (byte)-47 - 0xd1
                  Index 11:
                    (byte)40 (ASCII: () - 0x28
                  Index 12:
                    (byte)107 (ASCII: k) - 0x6b
                  Index 13:
                    (byte)103 (ASCII: g) - 0x67
                  Index 14:
                    (byte)60 (ASCII: <) - 0x3c
                  Index 15:
                    (byte)42 (ASCII: *) - 0x2a

样本文件 6618684157263489480.txt

demobin8 commented 2 years ago

sudo pip intstall javaobj-py3

Python 3.8.10 (default, Mar 15 2022, 12:22:08)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import javaobj
>>> j = javaobj.JavaObjectUnmarshaller(open('6618684157263489480.txt', 'rb')).readObject()
>>> j.
j.annotations j.cSubType j.dwMSGItemType j.faceName j.imageHeight j.index j.mobileparam j.sbfKey j.wSize
j.cFaceInfo j.classdesc j.dwTabID j.get_class( j.imageWidth j.mediaType j.resvAttr j.sbufID
>>> j.
fchunfen commented 2 years ago

通过对QQ安装程序的逆向,得到了表情实体类的定义。 其它消息格式的定义也可使用这种方式获取。 逆向工具来自pxb1988/dex2jar

public class MarkFaceMessage implements Serializable
{
    public static final long serialVersionUID = 102222L;

    public String backColor;

    public long beginTime = 0L;

    public int cFaceInfo = 1;

    public int cSubType = 3;

    public String copywritingContent;

    public int copywritingType = 0;

    public int dwMSGItemType = 6;

    public int dwTabID;

    public long endTime = 0L;

    public String faceName = null;

    public String from;

    public boolean hasIpProduct = false;

    public int imageHeight = 0;

    public int imageWidth = 0;

    public long index = 0L;

    public boolean isAPNG = false;

    public boolean isReword = false;

    public String jumpUrl;

    public int mediaType = 0;

    public byte[] mobileparam;

    public byte[] resvAttr;

    public byte[] sbfKey;

    public byte[] sbufID;

    public boolean shouldDisplay = false;

    public boolean showIpProduct = false;

    public StickerInfo stickerInfo = null;

    public List<Integer> voicePrintItems;

    public String volumeColor;

    public int wSize = 37;
}
lqzhgood commented 2 years ago

sudo pip intstall javaobj-py3

Python 3.8.10 (default, Mar 15 2022, 12:22:08)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import javaobj
>>> j = javaobj.JavaObjectUnmarshaller(open('6618684157263489480.txt', 'rb')).readObject()
>>> j.
j.annotations j.cSubType j.dwMSGItemType j.faceName j.imageHeight j.index j.mobileparam j.sbfKey j.wSize
j.cFaceInfo j.classdesc j.dwTabID j.get_class( j.imageWidth j.mediaType j.resvAttr j.sbufID
>>> j.

谢谢,我对 Python 也不懂,我看了下 https://pypi.org/project/javaobj-py3/ 文档也不知道如何遍历Key 能请教下怎么遍历 J 对象转成 JSON 并写入新文件么?


从这里找了一个 https://github.com/tcalmant/python-javaobj/issues/42#issuecomment-631925681 源码如下,应该没问题吧

from json import JSONEncoder
class MyCustomEncoder(JSONEncoder):
      def default(self,o):
             return o.__dict__

import javaobj
import json
j = javaobj.JavaObjectUnmarshaller(open('emoji.txt', 'rb')).readObject()
data=json.dumps(j, cls=MyCustomEncoder)
print(data)
lqzhgood commented 2 years ago

sudo pip intstall javaobj-py3

Python 3.8.10 (default, Mar 15 2022, 12:22:08)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import javaobj
>>> j = javaobj.JavaObjectUnmarshaller(open('6618684157263489480.txt', 'rb')).readObject()
>>> j.
j.annotations j.cSubType j.dwMSGItemType j.faceName j.imageHeight j.index j.mobileparam j.sbfKey j.wSize
j.cFaceInfo j.classdesc j.dwTabID j.get_class( j.imageWidth j.mediaType j.resvAttr j.sbufID
>>> j.

谢谢,我对 Python 也不懂,我看了下 https://pypi.org/project/javaobj-py3/ 文档也不知道如何遍历Key 能请教下怎么遍历 J 对象转成 JSON 并写入新文件么?

从这里找了一个 https://github.com/[tcalmant/python-javaobj/issues/42](https://github.com/tcalmant/python-javaobj/issues/42)#issuecomment-631925681 源码如下,应该没问题吧

from json import JSONEncoder
class MyCustomEncoder(JSONEncoder):
      def default(self,o):
             return o.__dict__

import javaobj
import json
j = javaobj.JavaObjectUnmarshaller(open('emoji.txt', 'rb')).readObject()
data=json.dumps(j, cls=MyCustomEncoder)
print(data)

解出来并不是很完美,能用。 估计还是要用 Java 原生弄。

{
    "classdesc": {
        "name": "com.tencent.mobileqq.data.MarkFaceMessage",
        "serialVersionUID": 102222,
        "flags": 2,
        "fields_names": [
            "cFaceInfo",
            "cSubType",
            "dwMSGItemType",
            "dwTabID",
            "imageHeight",
            "imageWidth",
            "index",
            "mediaType",
            "wSize",
            "faceName",
            "mobileparam",
            "resvAttr",
            "sbfKey",
            "sbufID"
        ],
        "fields_types": ["I", "I", "I", "I", "I", "I", "J", "I", "I", "Ljava/lang/String;", "[B", "[B", "[B", "[B"],
        "superclass": null
    },
    "annotations": [],
    "cFaceInfo": 1,
    "cSubType": 3,
    "dwMSGItemType": 6,
    "dwTabID": 107538,
    "imageHeight": 200,
    "imageWidth": 200,
    "index": 0,
    "mediaType": 0,
    "wSize": 37,
    "faceName": "吃饺子",
    "mobileparam": {
        "classdesc": {
            "name": "[B",
            "serialVersionUID": -5984413125824720000,
            "flags": 2,
            "fields_names": [],
            "fields_types": [],
            "superclass": null
        },
        "annotations": [],
        "_data": []
    },
    "resvAttr": {
        "classdesc": {
            "name": "[B",
            "serialVersionUID": -5984413125824720000,
            "flags": 2,
            "fields_names": [],
            "fields_types": [],
            "superclass": null
        },
        "annotations": [],
        "_data": [10, 6, 8, -84, 2, 16, -84, 2, 10, 6, 8, -56, 1, 16, -56, 1, 64, 1]
    },
    "sbfKey": {
        "classdesc": {
            "name": "[B",
            "serialVersionUID": -5984413125824720000,
            "flags": 2,
            "fields_names": [],
            "fields_types": [],
            "superclass": null
        },
        "annotations": [],
        "_data": [51, 101, 98, 98, 50, 56, 101, 57, 100, 55, 101, 55, 49, 100, 57, 100]
    },
    "sbufID": {
        "classdesc": {
            "name": "[B",
            "serialVersionUID": -5984413125824720000,
            "flags": 2,
            "fields_names": [],
            "fields_types": [],
            "superclass": null
        },
        "annotations": [],
        "_data": [101, -72, 46, -83, -1, 8, -111, -111, -8, -48, -86, -27, -70, 85, -66, 104]
    }
}
lqzhgood commented 2 years ago

解出来了~ 但是不知道如何对应上文件

{
    "index": 0,
    "faceName": "吃饺子",
    "dwMSGItemType": 6,
    "cFaceInfo": 1,
    "wSize": 37,
    "sbufID": [101, -72, 46, -83, -1, 8, -111, -111, -8, -48, -86, -27, -70, 85, -66, 104],
    "dwTabID": 107538,
    "cSubType": 3,
    "hasIpProduct": false,
    "showIpProduct": false,
    "sbfKey": [51, 101, 98, 98, 50, 56, 101, 57, 100, 55, 101, 55, 49, 100, 57, 100],
    "mediaType": 0,
    "imageWidth": 200,
    "imageHeight": 200,
    "mobileparam": [],
    "resvAttr": [10, 6, 8, -84, 2, 16, -84, 2, 10, 6, 8, -56, 1, 16, -56, 1, 64, 1],
    "isReword": false,
    "copywritingType": 0,
    "copywritingContent": "null",
    "jumpUrl": "null",
    "shouldDisplay": false,
    "stickerInfo": null
}
ZhangJun2017 commented 2 years ago

感谢提供思路,研究后发现只需要sbufID转换为Hex字符串即可取得对应文件。本地路径为: 内置储存/Android/data/com.tencent.mobileqq/Tencent/MobileQQ/.emotionsm/[dwTabID]/[id]_aio.png 其中[id]即为转换的Hex字符串,转换方式如下:

static String toHexString(byte[] arr) {
    StringBuffer toReturn = new StringBuffer();
    for (byte b : arr) {
        String byteHex = "00" + Integer.toHexString(b & 0xff);
        toReturn.append(byteHex.substring(byteHex.length() - 2));
    }
    return toReturn.toString();
}

另外还可以通过一个URL直接获取资源,应该会比从本地提取要方便一些,其中[idPrefix][id]的前两位 https://i.gtimg.cn/club/item/parcel/item/[idPrefix]/[id]/[imageWidth]x[imageHeight].png

ZhangJun2017 commented 2 years ago

但是这种方法无法支持动态表情,动态表情似乎需要用到sbfKey进行某种解密,同目录下没有.png后缀的同名文件就是加密后的文件。

lqzhgood commented 2 years ago

从里这里 https://luotianyi.vc/391.html 得知

文件夹中储存着的gif经过简单加密,可以通过16进制编辑器将00000000位的47 48 46 39 39 60 xx xx更改为标准的GIF89a编码47 49 46 38 39 61 xx xx,即可正确识别。

解决“100k以上的表情无法照此实现提取”的问题:以四位16进制为一组,遇到偶数+1,遇到奇数-1。比如4748是偶数,+1变成4749,322F是奇数,-1变成322E。整个文件头都要一一改变,直到不加密的地方为止。比如“4E44 5452 4340 5044 322F 3002 0101 0001 21FE 0B59 4D51 2045 6175 6159”解密后就变成“4E45 5453 4341 5045 322E 3003 0100 0000 21FF 0B58 4D50 2044 6174 6158”。这种方法可以解决“只改文件头会转换失败”的问题。

要是能拿到原文件和加密文件对比就好了。

lqzhgood commented 2 years ago

找到两组样本,加密文件和源文件大小不一致。 可以通过 2个字节一组,遇到偶数+1,遇到奇数-1 解密头部,从而得到文件类型(GIF、PNG),图片信息(宽高) 但是 Body 部分解密未知,附 样本和 JSON

{
  "imageWidth": 200,
  "sbufID": [
    41, -64, -90, 30, 33, -87, -83, -93, -47, 8, 16, -71, -18, -4, -104, -102
  ],
  "copywritingType": 0,
  "index": 0,
  "cFaceInfo": 1,
  "showIpProduct": false,
  "mediaType": 0,
  "wSize": 37,
  "imageHeight": 200,
  "faceName": "哼",
  "dwTabID": 195484,
  "hasIpProduct": false,
  "resvAttr": [
    10, 6, 8, -84, 2, 16, -84, 2, 10, 6, 8, -56, 1, 16, -56, 1, 64, 1
  ],
  "mobileparam": [],
  "sbfKey": [49, 97, 54, 57, 55, 54, 99, 98, 56, 48, 50, 56, 99, 99, 102, 101],
  "cSubType": 3,
  "dwMSGItemType": 6,
  "isReword": false,
  "shouldDisplay": false
}

防止图片被压缩,我放到 Zip 里面了。 15094.zip

lqzhgood commented 2 years ago

部分目录 可以获取到 /Android/data/com.tencent.mobileqq/Tencent/MobileQQ/.emotionsm/[dwTabID]/[dwTabID].jtmp 文件,格式是 Json,可以获取到表情包的名字 Package Name, 但是没有 jtmp 文件的表情包不知道如何获取 表情包名字 Package Name 从 Android QQ 打开相应表情会显示表情包名字,不知道藏哪了。

qqfav_[QQ号].db 应该已经废弃了,修改日期还是去年的


手机 QQ 点击表情后,才会在数据库 EmoticonPackage 中新增表情包信息,Emoticon 中新增表情包下的单个表情信息

并在 .emotionsm/[dwTabID]/[dwTabID].jtmp 下生成 JSON文件(我猜这是下载时的临时文件,忘了删)

.emotionsm 有两处 /Android/data/com.tencent.mobileqq/Tencent/MobileQQ/.emotionsm//sdcard/Tencent/MobileQQ/.emotionsm

通过抓包得知,可以通过以下方式获取表情包信息, cookie 由于QQ是统一身份认证,可以从QQ任意网站登录后获取,如https://id.qq.com

  const response = await axios.get('https://zb.vip.qq.com/hybrid/emoticonmall/detail?id=${dwTabID}', {
        headers: {
            'user-agent':
                'Mozilla/5.0 (Linux; Android 7.1.2; ONEPLUS A3010 Build/N2G47H; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/89.0.4389.72 MQQBrowser/6.2 TBS/046011 Mobile Safari/537.36 V1_AND_SQ_8.8.88_2770_YYB_D A_8088800 QQ/8.8.88.7830 NetType/WIFI WebP/0.3.0 Pixel/1080 StatusBarHeight/73 SimpleUISwitch/0 QQTheme/1000 InMagicWin/0 StudyMode/0 CurrentMode/0 CurrentFontScale/1.0 GlobalDensityScale/1.0285715 AppId/537117916',
            'cookie': `${cookie}`,
        },
    });

拿到表情的 PackageName 和 描述,后期就可以做 表情发送次数的统计与分析了

ZhangJun2017 commented 2 years ago

感谢,已经支持解码和导出:https://github.com/ZhangJun2017/QQChatHistoryExporter/commit/bfcf855be67e9f3320de5c5f9e5b2ace3cc7b845