BenningtonCS / GFS

Implementation of Google File System
54 stars 18 forks source link

support transmission/storage of binary data #57

Closed acencini closed 11 years ago

acencini commented 11 years ago

this may throw a wrench into the current send/recv pieces

if i want to open up a file (say, a 8gb video file or something) and store it in gfs, i'd use the API to append this to a GFS file

however, the data in a video may be any old set of bytes

so it may be necessary to add some more robustness to the protocol - e.g. passing in additional info on transmission (data length, etc.) as the first few bytes so the receiver knows how much to read and store, etc. this is a common tactic in most binary protocols - oftentimes, the first N bytes (e.g. first 4 bytes) are the unsigned length of the rest of the payload. real heroes also include a checksum at the beginning of the packet, but that is not necessary here.

acencini commented 11 years ago

also: related to #56

edaniszewski commented 11 years ago

In thinking about how to refactor this, I was thinking if it would make sense for different types of data to be sent and received ( strings, binaries, etc) or would it make more sense for all data to be stored as binary, and the API just converts it eg from string --> binary on send to chunkservers or from binary --> string on read?

This is more of an open question for discussion, as I didn't see much about what things were stored as in the GFS paper, other than "a plain Linux file on a chunkserver"

acencini commented 11 years ago

text files and binary files are the same under the covers - it's more of how you work for them (e.g. readline works for a text file but not a binary file). if you think of all files as binary files, and then think of how you can make working with text a little easier, then you might be well-served.

SisterMystery commented 11 years ago

wait.... readline DOESN'T work!? I'm confused then, because i was looking around to get an idea of what i was doing and i did a readline of sublime text and it worked.... kind of...?

acencini commented 11 years ago

of the binary? treating a binary file like a text file will produce unspecified results. readline simply looks for the newline character in a stream of bytes; it may very well be the case that a byte or sequence of bytes in sublimetext happened to look like a newline character, causing the readline to "succeed". in reality readline is only meaningful for text files, where newline characters will only be present to signal the presence of a newline.

SisterMystery commented 11 years ago

ah. well in other news, all data now storeable as binary data, and retrievable, however i can't seem to do it with an initially binary thing quite yet.

acencini commented 11 years ago

how did this turn out today?

SisterMystery commented 11 years ago

How did this turn out? hmm... I dunno. foxtest 1000 1001 1002

Why don't you try asking this fox who was thrown into the GFS and subsequently retrieved? What's that? his vocal chord bytes are all in the same place they were before? well how about that.

awaiting verification

SisterMystery commented 11 years ago

Update. It also successfully stores and retrieves sublime text. Verified @Kgespada.com

acencini commented 11 years ago

grumpy cat photo or it never happened

On Tue, Nov 19, 2013 at 6:53 PM, WeedFox notifications@github.com wrote:

Closed #57 https://github.com/BenningtonCS/GFS/issues/57.

— Reply to this email directly or view it on GitHubhttps://github.com/BenningtonCS/GFS/issues/57 .