GreptimeTeam / greptimedb

An Open-Source, Cloud-Native, Unified Time Series Database for Metrics, Logs and Events with SQL/PromQL supported. Available on GreptimeCloud.
http://greptimedb.rs/
Apache License 2.0
4.02k stars 290 forks source link

Checksum for manifests #3004

Open killme2008 opened 6 months ago

killme2008 commented 6 months ago

What type of enhancement is this?

Tech debt reduction, Other

What does the enhancement do?

The manifest doesn't have any checksum for data validation. We need a way to do the checksum validation for region manifests. A possible way is saving the checksum as the part of manifest file name, for example, 000000000001-{checksum}.json.

After reading the file content, we can calculate the content checksum by CRC32 or other algorithms and ensure the value is equals to the checksum in the file name.

Implementation challenges

No response

evenyag commented 6 months ago

Another way is saving a checksum JSON object for the metadata in the next line of the file content, like a footer.

{"manifest": "manifest_data"}
{"footer": { "checksum": 235423 }}
CrystalAnalyst commented 6 months ago

Hi, I'd like to have a try.

CrystalAnalyst commented 6 months ago

Need help: How can I design the return Value for fn verify_checksum(&self, content: &[u8]) ? since there's a alias pub type Result<T> = std::result::Result<T, Error> I've tried to return an error like VerifyChecksum but failed. any suggetions ? image

CrystalAnalyst commented 6 months ago

@killme2008 @evenyag gimme some help or advice, I'd appreciate it, thanks.

evenyag commented 6 months ago

We use snafu to generate errors. I guess that you want to return a leaf error.

https://docs.rs/snafu/0.8.0/snafu/guide/examples/basic/enum.Error.html#leaf-errors

You can add a variant to the Error type in the relevant crate. https://github.com/GreptimeTeam/greptimedb/blob/f735f739e5de7a028a7b860b4e507eb774c9523a/src/mito2/src/error.rs#L37

CrystalAnalyst commented 6 months ago

We use snafu to generate errors. I guess that you want to return a leaf error.

https://docs.rs/snafu/0.8.0/snafu/guide/examples/basic/enum.Error.html#leaf-errors

You can add a variant to the Error type in the relevant crate.

https://github.com/GreptimeTeam/greptimedb/blob/f735f739e5de7a028a7b860b4e507eb774c9523a/src/mito2/src/error.rs#L37

ok, I'll try.

tisonkun commented 3 months ago

@killme2008 @evenyag May you provide some of the related source code?

We have a few of Manifest, said FileRegionManifest/RegionManifest. It's not quite clear what's proposed in this issue.

evenyag commented 3 months ago

We store several manifest files. https://github.com/GreptimeTeam/greptimedb/blob/20e8c3d864bd1fc4baf26ee273a03d46c8c7d399/src/mito2/src/manifest/storage.rs#L351-L370

https://github.com/GreptimeTeam/greptimedb/blob/20e8c3d864bd1fc4baf26ee273a03d46c8c7d399/src/file-engine/src/manifest.rs#L46-L63

Both of them are plain JSON files. We might store a checksum in each file to help validate whether the file is not corrupted.