Data checksum - Githubissues

ottuzzi commented 9 years ago

Hi,

it would be interesting if I can ask the target drive to return a checksum of written data but I do not see this possibility in the protocol: am I missing some detail? I would like to be sure that if I asked to write some data these are really written and read as intended by the target disk: what I'm thinking is some new call to write data and, contextually, to return the checksummed value of what it was written to the disk. The returned value can be checked with the "host" own value so we can have a good probability everything is fine. What do you think?

Thanks Bye Piero

jphughes commented 9 years ago

At this time we do not return the checksum of the data to be written, but we will check the value that is sent along with the data.

The way that it works now is:

Data is sent and the tag/algorithm contains the checksum (hash, cry, etc.) of the value
If the drive knows the tag/algorithm that was used, the drive will check that the tag and the data are still correct, and if they are not, report the key that is not correct.
On read, the original tag returned so that the reader can check it.
At no time does the drive change the received tag.

Since the tag is set before sending, you can be assured that there is a complete end-2-end data integrity, If the drive calculated this, there is the risk of a data integrity failure between the host and the drive (i.e, TCP is not perfect, and TCP error detection is not perfect either) the drive would return the checksum of the wrong information.

Hope this helps

Jim

On Jan 12, 2015, at 2:15 AM, Piero Ottuzzi notifications@github.com wrote:

Hi,

it would be interesting if I can ask the target drive to return a checksum of written data but I do not see this possibility in the protocol: am I missing some detail? I would like to be sure that if I asked to write some data these are really written and read as intended by the target disk: what I'm thinking is some new call to write data and, contextually, to return the checksummed value of what it was written to the disk. The returned value can be checked with the "host" own value so we can have a good probability everything is fine. What do you think?

Thanks Bye Piero

— Reply to this email directly or view it on GitHub https://github.com/Seagate/kinetic-protocol/issues/23.

ottuzzi commented 9 years ago

Hi,

thank you very much for your answers: everything you say is clear but I was thinking to a more thorough check. I'll try to show the differences between what I understood is implemented at this moment and what I was thinking about.

NOW

HOST sends data with a checkum
DRIVE checks if data and checkum matches and if OK it physically writes data; this will prevent network errors
on every read DRIVE returns data and the original checksum and HOST can now check if they match

MY PROPOSAL

In my proposal you keep the same behaviour but I'm asking to add a new workflow working this way:

HOST sends data with a checkum
DRIVE checks if data and checkum matches and if OK it physically writes data
DRIVE immediately re-reads data and checks if data and checkum matches and if OK it sends OK to HOST; if re-read data do not match with checksum it returns an error.

The whole point here is to avoid a subtle disk error: in your workflow last check is in data arrival to disk frontend, in my proposal last check is about data written on disk. It can happen that data written to disk cannot be read correctly. With your approach you will know data cannot be read correctly on next read (probably when you need them), in my proposed additional workflow you know immediately that data can be read... at least for now ;)

Hope I was more clear than in first post :)

Thanks in advance Bye Piero

Kinetic / kinetic-protocol

Data checksum #23

NOW

MY PROPOSAL