Alluxio / alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud
https://www.alluxio.io
Apache License 2.0
6.87k stars 2.94k forks source link

目录中有中文名的文件导致s3客户端显示协议冲突 #18662

Open woodlyer opened 4 months ago

woodlyer commented 4 months ago

Alluxio Version: What version of Alluxio are you using? alluxio-2.9.5

Describe the bug A clear and concise description of what the bug is. 当目录中有中文名的文件时,S3客户端连接不上。 原因是服务器输出的 content_length长度不正确,导致内容少几个字节,解析xml失败。

the output is like this. 结尾的ListBucketResult不完整。

</ListBucketResult>
<ListBucketResult><version2>false</version2><MaxKeys>1000</MaxKeys><IsTruncated>false</IsTruncated>
<CommonPrefixes><Prefix>ccdata.tdf/</Prefix></CommonPrefixes>
<CommonPrefixes><Prefix>cn/</Prefix></CommonPrefixes>
<CommonPrefixes><Prefix>dir/</Prefix></CommonPrefixes><CommonPrefixes>
<Prefix>test.tdf/</Prefix></CommonPrefixes><Marker></Marker>
<Contents><LastModified>2024-07-25T10:43:29.657Z</LastModified><Key>123.txt</Key><Size>15</Size></Contents>
<Contents><LastModified>2024-07-18T17:13:50.054Z</LastModified><Key>123.txt.tdf</Key><Size>1793</Size></Contents>
<Contents><LastModified>2024-07-18T17:13:50.222Z</LastModified><Key>1m.txt.tdf</Key><Size>1050270</Size></Contents>
<Contents><LastModified>2024-07-25T10:46:55.835Z</LastModified><Key>副本.txt</Key><Size>15</Size></Contents>
<Delimiter>/</Delimiter><Prefix></Prefix><Name>mnt</Name></ListBucketRes  <---------see this

有一个文件的名字是"副本.txt", utf-8 编码的。 输出的xml内容里面最后的不完整, 少了4个字节。 原因是http 头里面的content_length 比实际内容少4个字节。 为什么是4个字节呢? 推测是在计算内容长度时,计算的是字符的个数,“副本”这个词长度得到的是2,但是实际上是6(两个utf-8)的长度,所以少了4. 二进制可以看出来: image

To Reproduce Steps to reproduce the behavior (as minimally and precisely as possible) 搞一个中文放在目录里, 用s3协议访问,客户端直接报错。S3 Browser 11.7.5
协议冲突。 其实是返回的内容最后面缺东西,解析失败。 image

Expected behavior A clear and concise description of what you expected to happen.

Urgency Describe the impact and urgency of the bug.

Are you planning to fix it Please indicate if you are already working on a PR. no

Additional context Add any other context about the problem here.