Open petersilva opened 5 years ago
People should read about the existing header here:
https://github.com/MetPX/sarracenia/blob/master/doc/sr_postv3.7.rst#the-fixed-fields
The parts field supports replication of partitioned files where the original file is either availble at source in one file, or only a part of it is available (on an intermediate server.) This is used to transfer very large files while preventing the capybara in the anaconda effect, creating large lumps on the intervening data pumps. the block size is programmable using the same field.
v02 format: parts="<method>,<bsz>,<blktot>,<brem>,<bno>"
bsz = is the size of the block being advertised.
blktot = total number of blocks in the file.
brem = the remainder, or size of the last block.
bno = the block number of the current block.
method=1|p|i
* 1 - a complete file announced as one piece. In this case, the bsz gives the size of the entire file.
blktot=1,brem=0,bno=1
* i - inplace strategy, the announcement describes a portion of a file. so on the host advertising a
file, The entire file will be available for download, but the announcement only deals with a subset
of the file. For example, say the file is 5000 bytes long and the blocksize is 1024, then five
portions of the file would be advertised:
i,1024,5,904,0
i,1024,5,904,1
i,1024,5,904,2
i,1024,5,904,3
i,1024,5,904,4
* p - partitioned strategy, in this case, the file to retrieve, intead of being one large file there will
be five partitions on the server, named <filename>,Part,1024,5,904,0 ...
In either case, the size of the file is blksiz (blktot-1) + brem = 10244 + 904 = 5000.
In use, the vast majority of files are sent as single part files (strategy: 1), so it would be reasonable to abstract out the parts header as:
size=5000
and nothing else. Only in the case of a partitioned file, would we need to add another header,
perhaps something like:
"size": 904,
"blocks" :
{
"method": "inplace" | "partitioned",
"size": 1024,
"count": 5,
"remainder": 904,
"number" : 4
}
so for partitioned files, there would be an extra header. The question is, can we keep the file name suffixes the same as v02, as having JSON in filenames would be ungainly. Probably better to use the Part,1024,5,904,0 file suffix as-is. What do people think ?
I see 2 thing that doesnt make it:
for '1' there is no 'block' header... only 'size', so there is no brem. in v02... there was a choice. It could have been brem... we just picked blocksize ('1' means that there is one block in the file, and the size of that block could be given by either field...) yes, using brem would have been more consistent with the other strategies, but then we would have had a 0 sized block... which seemed odd. it was a choice for v02. for v03, you don't need to make any such choice because for '1', there is only 'size' which is a distinct improvement.
second comment: yeah double typo. fixed now, added 3, and removed 5.
This proposal is now implemented in Sarracenia v2.19.03b1.
The header giving file size was one of those stripped from Sarracenia to make the example, as it was unnecessary to make the protocol work. ETCTS wanted a header to indicate the size of products being advertised. The sarracenia has a header that gives this information, called parts (for partition strategy) which is fully documented. Changing the logic of that header is relatively difficult, so it should not be changed lightly. Putting a place holder here to track a discussion.