dianping / cat

CAT 作为服务端项目基础组件,提供了 Java, C/C++, Node.js, Python, Go 等多语言客户端,已经在美团点评的基础架构中间件框架(MVC框架,RPC框架,数据库框架,缓存框架等,消息队列,配置系统等)深度集成,为美团点评各业务线提供系统丰富的性能指标、健康状况、实时告警等。
Apache License 2.0
18.63k stars 5.42k forks source link

最新版本master(对应v4.0-RC1)部署后Log View为空 #2309

Open lxil520 opened 1 year ago

lxil520 commented 1 year ago

打开Log View出现下面内容 Sorry, the message is not there. It could be missing or archived.

lxil520 commented 1 year ago

server配置如下:


<?xml version="1.0" encoding="utf-8"?>
<server-config>
   <server id="default">
      <properties>
         <property name="local-mode" value="false"/>
         <property name="job-machine" value="false"/>
         <property name="send-machine" value="false"/>
         <property name="alarm-machine" value="false"/>
         <property name="hdfs-machine" value="false"/>
         <property name="remote-servers" value="192.168.1.71:2281,192.168.1.72:2281,192.168.1.73:2281"/>
      </properties>
      <storage local-base-dir="/data/appdatas/cat/bucket/" max-hdfs-storage-time="15" local-report-storage-time="7" local-logivew-storage-time="7" har-mode="true" upload-thread="5">
         <hdfs id="logview" max-size="128M" server-uri="hdfs://192.168.1.71/" base-dir="user/cat/logview"/>
         <hdfs id="dump" max-size="128M" server-uri="hdfs://192.168.1.71/" base-dir="user/cat/dump"/>
         <hdfs id="remote" max-size="128M" server-uri="hdfs://192.168.1.71/" base-dir="user/cat/remote"/>
      </storage>
      <consumer>
         <long-config default-url-threshold="1000" default-sql-threshold="100" default-service-threshold="50">
            <domain name="cat" url-threshold="500" sql-threshold="500"/>
            <domain name="OpenPlatformWeb" url-threshold="100" sql-threshold="500"/>
         </long-config>
      </consumer>
   </server>
   <server id="192.168.1.71">
      <properties>
         <property name="job-machine" value="true"/>
         <property name="alarm-machine" value="true"/>
         <property name="send-machine" value="true"/>
      </properties>
   </server>
</server-config>
lxil520 commented 1 year ago

目前本地断点发现解码器问题,version取出来是乱码 image

yangxk commented 1 year ago

Same issue, fixed ?

lxil520 commented 1 year ago

Same issue, fixed ?

no,i haven't solved it yet.

RainkingTriu commented 1 year ago

property name="remote-servers" value="192.168.1.71:2281,192.168.1.72:2281,192.168.1.73:2281" 这里的端口是cat的web端口8080吧?

smallleaf commented 1 year ago

是arm架构吗,arm有几个地方需要修改下

lxil520 commented 1 year ago

是arm架构吗,arm有几个地方需要修改下

是的,需要改哪里?

lxil520 commented 1 year ago

property name="remote-servers" value="192.168.1.71:2281,192.168.1.72:2281,192.168.1.73:2281" 这里的端口是cat的web端口8080吧?

是的,这个是没问题的

smallleaf commented 1 year ago

是arm架构吗,arm有几个地方需要修改下

是的,需要改哪里?

1.升级snappy包

org.xerial.snappy snappy-java 1.1.10.3

不升级这个包,你会发现dump文件出不来,原因就是arm架构读取文件不行。org.unidal.cat.message.storage.internals.DefaultBlock#createOutputSteam 这个位置会卡住,导致无法存储logview。

2.com.dianping.cat.report.page.logview.service.LocalMessageService#buildNewReport 修改为如下 private String buildNewReport(ModelRequest request, ModelPeriod period, String domain, ApiPayload payload) throws Exception { String messageId = payload.getMessageId(); boolean waterfall = payload.isWaterfall(); MessageId id = MessageId.parse(messageId); ByteBuf buf = m_finderManager.find(id); MessageTree tree = null;

    if (buf != null) {
        tree = CodecHandler.decode(changeBuf(buf));
    }

    if (tree == null) {
        Bucket bucket = m_bucketManager.getBucket(id.getDomain(),
              NetworkInterfaceManager.INSTANCE.getLocalHostAddress(), id.getHour(), false);

        if (bucket != null) {
            bucket.flush();

            ByteBuf data = bucket.get(id);

            if (data != null) {
                tree = CodecHandler.decode(changeBuf(data));
            }
        }
    }

    if (tree != null) {
        ByteBuf content = ByteBufAllocator.DEFAULT.buffer(8192);

        if (tree.getMessage() instanceof Transaction && waterfall) {
            m_waterfall.encode(tree, content);
        } else {
            m_html.encode(tree, content);
        }

        try {
            content.readInt(); // get rid of length
            return content.toString(Charset.forName("utf-8"));
        } catch (Exception e) {
            // ignore it
        }
    }

    return null;
}

private ByteBuf changeBuf(ByteBuf data) {
    data.markReaderIndex();
    int length = data.readInt();
    data.resetReaderIndex();
    ByteBuf readBytes = data.readBytes(length + 4);

    readBytes.markReaderIndex();
    readBytes.readInt();
    return readBytes;
}

主要是changeBuf,因为0-4位被占用了,但是最新master分支,没有处理是从0字节开始读取,导致解析失败了。

vectorstone commented 8 months ago

非arm架构,x64的linux部署的服务端,也出现了Sorry, the message is not there. It could be missing or archived,服务端配置文件如下:

<?xml version="1.0" encoding="utf-8"?>
<server-config>
   <server id="default">
      <properties>
         <property name="local-mode" value="true"/>
         <property name="job-machine" value="true"/>
         <property name="send-machine" value="true"/>
         <property name="alarm-machine" value="true"/>
         <property name="hdfs-enable" value="false"/>
         <property name="remote-servers" value="127.0.0.1:8080"/>
      </properties>
      <storage local-base-dir="/data/appdatas/cat/bucket/" max-hdfs-storage-time="15" local-report-storage-time="7" local-logivew-storage-time="7" har-mode="true" upload-thread="5">
         <hdfs id="logview" max-size="128M" server-uri="hdfs://127.0.0.1/" base-dir="user/cat/logview"/>
         <hdfs id="dump" max-size="128M" server-uri="hdfs://127.0.0.1/" base-dir="user/cat/dump"/>
         <hdfs id="remote" max-size="128M" server-uri="hdfs://127.0.0.1/" base-dir="user/cat/remote"/>
      </storage>
      <consumer>
         <long-config default-url-threshold="1000" default-sql-threshold="100" default-service-threshold="50">
            <domain name="cat" url-threshold="500" sql-threshold="500"/>
            <domain name="OpenPlatformWeb" url-threshold="100" sql-threshold="500"/>
         </long-config>
      </consumer>
   </server>
   <server id="127.0.0.1">
      <properties>
         <property name="job-machine" value="true"/>
         <property name="alarm-machine" value="true"/>
         <property name="send-machine" value="true"/>
      </properties>
   </server>
</server-config>
frogwithumbrella commented 3 months ago

非arm架构,x64的linux部署的服务端,也出现了Sorry, the message is not there. It could be missing or archived,服务端配置文件如下:

<?xml version="1.0" encoding="utf-8"?>
<server-config>
   <server id="default">
      <properties>
         <property name="local-mode" value="true"/>
         <property name="job-machine" value="true"/>
         <property name="send-machine" value="true"/>
         <property name="alarm-machine" value="true"/>
         <property name="hdfs-enable" value="false"/>
         <property name="remote-servers" value="127.0.0.1:8080"/>
      </properties>
      <storage local-base-dir="/data/appdatas/cat/bucket/" max-hdfs-storage-time="15" local-report-storage-time="7" local-logivew-storage-time="7" har-mode="true" upload-thread="5">
         <hdfs id="logview" max-size="128M" server-uri="hdfs://127.0.0.1/" base-dir="user/cat/logview"/>
         <hdfs id="dump" max-size="128M" server-uri="hdfs://127.0.0.1/" base-dir="user/cat/dump"/>
         <hdfs id="remote" max-size="128M" server-uri="hdfs://127.0.0.1/" base-dir="user/cat/remote"/>
      </storage>
      <consumer>
         <long-config default-url-threshold="1000" default-sql-threshold="100" default-service-threshold="50">
            <domain name="cat" url-threshold="500" sql-threshold="500"/>
            <domain name="OpenPlatformWeb" url-threshold="100" sql-threshold="500"/>
         </long-config>
      </consumer>
   </server>
   <server id="127.0.0.1">
      <properties>
         <property name="job-machine" value="true"/>
         <property name="alarm-machine" value="true"/>
         <property name="send-machine" value="true"/>
      </properties>
   </server>
</server-config>

请问解决了吗,遇到了同样的问题