alibaba / nginx-tfs

An Asynchronous Nginx module providing a RESTful API for TFS (Taobao File System).
http://tfs.taobao.org/
261 stars 171 forks source link

nginx-tfs 不能将大于2M的文件写入到TFS #17

Closed duncking closed 11 years ago

duncking commented 11 years ago

问题描述: 通过nginx-tfs上传大于2m的文件到TFS失败,nginx-tfs返回500。而使用TFS client可以正常写入大于2m的文件。在dataserver的日志发现, dataserver 解析nginx-tfs传来的数据报,取到数据包长度为负值。会议nginx-tfs存在bug。

nginx-tfs 相关的配置如下:

TFS

tfs_send_timeout 3s;
tfs_read_timeout 3s;
tfs_connect_timeout 3s;
tfs_body_buffer_size 10M;
tfs_block_cache_zone size=256M;
tfs_keepalive max_cached=128 bucket_count=16;

nginx的debug日志如下: 2013/10/09 16:15:20 [info] 19460#0: _8 meta segment: block_id: 0, fileid: 0, seqid: 0, suffix: 0, client: 127.0.0.1, server: tfs._.com, request: "POST /v1/tfs HTTP/1.1", host: "tfs..com" 2013/10/09 16:15:20 [info] 19460#0: *8 get block info from ns while connecting server, client: 127.0.0.1, server: tfs.**.com, request: "POST /v1/tfs HTTP/1.1", host: "tfs.**.com" 2013/10/09 16:15:20 [debug] 19460#0: 8 connecting name server, addr: 10.33.56.125:8108 2013/10/09 16:15:20 [info] 19460#0: 8 http tfs finalize state name server, 0 while reading response header from tfs, client: 127.0.0.1, server: tfs.**.com, request: "POST /v1/tfs HTTP/1.1", host: "tfs..com" 2013/10/09 16:15:20 [info] 19460#0: *8 http tfs process next peer is data server, addr: 10.33.56.198:8010 while reading response header from tfs, client: 127.0.0.1, server: tfs.**.com, request: "POST /v1/tfs HTTP/1.1", host: "tfs.**.com" 2013/10/09 16:15:20 [debug] 19460#0: 8 connecting data server, addr: 10.33.56.198:8010 2013/10/09 16:15:20 [info] 19460#0: 8 http tfs finalize state data server, 0 while reading response header from tfs, client: 127.0.0.1, server: tfs.**.com, request: "POST /v1/tfs HTTP/1.1", host: "tfs.*__.com" 2013/10/09 16:15:20 [info] 19460#0: 8 http tfs process next peer is data server, addr: 10.33.56.198:8010 while reading response header from tfs, client: 127.0.0.1, server: tfs.__.com, request: "POST /v1/tfs HTTP/1.1", host: "tfs..com" 2013/10/09 16:15:20 [info] 19460#0: *8 write segment index 0, block id: 2404301, file id: 2354, offset: 0, length: 2097152, crc: 2021805896 while connecting server, client: 127.0.0.1, server: tfs.**.com, request: "POST /v1/tfs HTTP/1.1", host: "tfs.**.com" 2013/10/09 16:15:20 [debug] 19460#0: 8 connecting data server, addr: 10.33.56.198:8010 2013/10/09 16:15:20 [error] 19460#0: 8 readv() failed (104: Connection reset by peer) while reading response header from tfs, client: 127.0.0.1, server: tfs.**.com, request: "POST /v1/tfs HTTP/1.1", host: "tfs.*__.com" 2013/10/09 16:15:20 [error] 19460#0: 8 recv chain error while reading response header from tfs, client: 127.0.0.1, server: tfs._.com, request: "POST /v1/tfs HTTP/1.1", host: "tfs.***.com"

TFS 使用2.2.13 版本。dataserver的报错日志为: 2013-10-09 18:07:22] ERROR getPacketInfo (base_packet_streamer.cpp:86) [139896236799744] stream error: 69fc7285<>4d534654,4e534654, dataLen: 432408354 [2013-10-09 18:07:22] ERROR reply (base_packet.cpp:221) [139896073172736] post packet failure, server: 10.33.56.23:55566, pcode:1 [2013-10-09 18:07:22] ERROR callback (dataservice.cpp:854) [139896073172736] write data fail. filenumber: 14152088283100828036, blockid: 2399407, fileid: 2915, version: 2914, leaseid: 1198248338, role: master

[2013-10-09 17:28:43] ERROR getPacketInfo (base_packet_streamer.cpp:86) [139896236799744] stream error: ebe940fc<>4d534654,4e534654, dataLen: -1052153523 [2013-10-09 17:28:43] ERROR reply (base_packet.cpp:221) [139896167581440] post packet failure, server: 10.81.102.121:54715, pcode:1 [2013-10-09 17:28:43] ERROR callback (dataservice.cpp:854) [139896167581440] write data fail. filenumber: 14152088283100823092, blockid: 2407850, fileid: 1621, version: 1620, leaseid: 1197348515, role: master

duncking commented 11 years ago

测试了一下,上传的时候加上 large_file=1的参数可以写入2m以上的文件。但是这就成为大文件了写入到TFS。我理解TFS的大文件是在文件大小大于main_block_size的时候才使用的,一个2m - 3m的文件就使用large_file是不应该的。

推测是nginx-tfs的bug?

还测试,读取大于2m的文件是可以的。

duncking commented 11 years ago

测试了 tengine 里面的 tfs 代码,还是不对。

duncking commented 11 years ago

debug 发现问题是这样的: nginx-tfs 向TFS一次最大写2m的数据,那么大于2m的文件就需要切块,多次向TFS写入。而在向TFS多次写入的时候,每次都调用了create file,即一个文件的多个分块当成多个文件写入了,并且从第二次开始发送的数据包内容也不对。

所以是有两问题:

  1. 一个大于2m的文件被切成多个分块,分别调用了TFS 创建文件写入数据的过程。
  2. 第二个开始的文件分块写入,packet格式错误。没有flag,没有length等header信息。

问题可以使用large_file解决,可是一个3-4MB的文件也按照TFS large file来存储,也不对。

zhcn381 commented 11 years ago

这个问题是个已知bug,已经修复,请从https://github.com/alibaba/tengine 下载最新代码即可。