getavalon / core

The safe post-production pipeline - https://getavalon.github.io/2.0
MIT License
218 stars 49 forks source link

C4 Universal Asset ID #273

Open mottosso opened 7 years ago

mottosso commented 7 years ago

Evaluate whether C4 is relevant to us.

davidlatwe commented 6 years ago

CCCC

After a close looking, I find the development of C4 framework seems in hibernate, maybe even outdated.

Here's some links to know C4 framework C4 Framework C4 Language doc

As for the C4 Asset ID's implementation, it's simply hashing file, one can easily implement with Python's hashlib, but to get ride of chars that makes hash string non-double-click-selectable like + - / =, indeed require some tricks to do that, but I think it's not a Must, because in my imaging, I don't see one would need to copy-paste hash string by hand very often.

They have PyC4, but outdated, seems not act the same with c4's Go implementation.

Hashing files

I think we could use simple file hashing to verify asset's integrity, while downloading / loading asset representation, or checking out source asset's modification before publish (like checking textures when publishing LookDev).

But comparing two files with simple hash string can only tells you that they are identical or not.

Similarity hash

While googling file hashing knowledge, I found similarity hash. I think this is much useful for us, it not only tells you two files are the same or not, but also tells you how they are alike !

Here's two interesting repo I found: imagehash python-hashes

Did not look deep / test in those two yet, but I think this could lead us to better life.

mottosso commented 6 years ago

Thanks for the detailed follow-up @davidlatwe

davidlatwe commented 6 years ago

After a few more tests on Similarity Hash, I find that I miss understand the use case of it. (facepalm)

It's more useful on Searching (obviously), so I think, unless we are going to build a set-dressing library or other kind of texture/matte-paint database, Similarity Hash can be ignored. At least it's not fit into publish process which strict modification comparing is required (Similarity Hash can provide strict comparing, but need more calculation).

Back to file content hashing (Asset ID)

On the use case of preventing content create duplication, beside commonly used image format, some major 3d asset exchange format .obj .fbx .abc may not benefit from hashing because it saved with timestamp, which leads to re-exporting same content makes different hashing result. But .usd may work since it does not embedded such metadata (in my recall).

On the perspective of content validation, no matter production involve cloud storage or not, we can all gain some extra long-term security from hashing asset file in every publish process

Do we need C4 for file hashing ?

If we only work locally, then it's a quick answer, No. hashlib is much more convenient since we all use Python.

If we need to exchange file cross studios or sites, I think the answer depends on the development of C4 framework. One major issue need to be addressed would be how we hash directory, unless we all sharing with .zip.

Currently, the directory hashing behavior of C4, will ignore duplicated files and generate same IDs for each directory, this may not the way we want, since there might have same file sequence but different length.

No matter using C4 or not, if we are going to work on cloud or any form of asset exchange, I think we need some one to regulate how we hash nested asset representation files / directory.

davidlatwe commented 6 years ago

Hey @mottosso, thanks :) ( I should type the second post faster :P )

BigRoy commented 5 years ago

I am looking into C4 now regarding an idea for deduplicating textures between publishes, see more details here