Closed kirillt closed 6 months ago
We should also support two modes of operation for the library:
safe
or strict
using a cryptographic hash functionfast
using a fast hash function (experimental mode)We could also support two modes of operation for the library:
* `safe` or `strict` using a cryptographic hash function * `fast` using a fast hash function (experimental mode)
This is a very interesting idea To achieve this, we can use features with conditional compilation
Maybe we can implement ResourceIdTrait
as an alternative to abstracting ResourceIndex over the ID type directly.
It forces consistency across different ID implementations and simplifies handling different hashing algorithms.
Each ResourceId
struct will implement ResourceIdTrait
and has its own logic for calculating the hash (which can be of different types).
Check out genericResourceId branch
for a draft implementation, including how ResourceIdCrc32
works with the trait.
@tareknaser When we finish #90 and move the code to ark-rust repo, is it a good idea to split "secure" and "fast" indexes into separate crates? This way we could create optimized index structure without counting collisions when we use secure hash functions.
I also imagine creating something like test-fs-index
crate which would verify that fs-index-fast
and fs-index-secure
are semantically equivalent.
is it a good idea to split "secure" and "fast" indexes into separate crates?
I thought about this as well but couldn't find a straightforward solution. Ideally, I would prefer to decouple the index entirely from the resource ID computations, but this isn't feasible because we need to track collisions with non-cryptographic hash functions.
Once this PR is merged alongside https://github.com/ARK-Builders/arklib/pull/87, I plan to dedicate some time to exploring options for conditionally compiling the collision resolution functionality.
for example, we could do
#[derive(PartialEq, Clone, Debug)]
pub struct ResourceIndex {
pub id2path: HashMap<ResourceId, CanonicalPathBuf>,
pub path2id: HashMap<CanonicalPathBuf, IndexEntry>,
#[cfg(feature = "non-cryptographic")]
pub collisions: HashMap<ResourceId, usize>,
root: PathBuf,
}
but I would dive deeper into it when we settle on the API for ResourceIndex
first because it would require modifications all over the code.
@tareknaser I like how conditional compilation looks in your example.
#[cfg(feature = "non-cryptographic")]
pub collisions: HashMap<ResourceId, usize>,
If it'll have same ergonomics all over the crate, then we could stick to it. Single index implementation should be better. Although it might be a bit more complicated to maintain.
I also imagine creating something like
test-fs-index
crate which would verify thatfs-index-fast
andfs-index-secure
are semantically equivalent.
This test-fs-index
crate would depend on 2 instances of same crate then: 1 with non-cryptographic
flag ON, and one with the flag OFF. Is it technically possible?
Ideally, I would prefer to decouple the index entirely from the resource ID computations, but this isn't feasible because we need to track collisions with non-cryptographic hash functions.
Not sure about this part. In fact, ResourceId
should be separate from index, because we might have non-fs index later. Or even apps without index, especially when the resources are not persisted. Collisions tracker is updated during index construction and updates, are you sure ResourceId
is coupled to it?
Actually, we already have such a crate: https://github.com/ARK-Builders/ark-rust/blob/main/data-resource
If it'll have same ergonomics all over the crate, then we could stick to it. Single index implementation should be better. Although it might be a bit more complicated to maintain.
Once the API in issue #91 is finalized, we'll likely simplify it. Additionally, some tests in index.rs
might be better suited for integration testing within ark-rust
using ark-cli
crate. This way we validate the project's integrity between crates as well.
Simplifying the API and including only essential unit tests related to ResourceIndex
would make maintenance easier if we add compilation features.
This
test-fs-index
crate would depend on 2 instances of same crate then: 1 withnon-cryptographic
flag ON, and one with the flag OFF. Is it technically possible?
It may not be feasible at the moment, but there's an ongoing discussion about it. See https://github.com/rust-lang/cargo/issues/2980#issuecomment-1552153211
Maybe that could work as an integration test. We could create a simple shell script that uses ark-cli
to interact with fs-index
, running it twice in the CI environment: once with the feature enabled and once disabled.
are you sure
ResourceId
is coupled to it?
It's not coupled per se but if we're using a non-cryptographic ResourceId
, collisions still need to be tracked.
We can abstract
ResourceIndex
over the id type and, consequently, over hashing function.This way, we can use cryptographic hash functions to test index in simpler cases, when no collisions are present. This can simplify development and debugging. We could also use fake hash functions for testing only collisions.
Cryptographic hash function can also be used for apps where safety is more important than performance. The fast hash functions will be used in experimental "fast mode", it's the most useful for file browser apps.