Deduplication of vertex normal attributes

GoogleCodeExporter commented 9 years ago

Hi Alembic,

Often we explicitly set vertex normals on a model to customize cusping effects 
and such, and then these are updated throughout the animation. Sadly, storing 
these vertex normals in abc files will inflate your file size massively as they 
are animated values per vertex. 

So it occurred to me that perhaps the de-duplication procedure could test if a 
vector attribute values are merely reorientations of a single/previous frame's 
data and if so, avoid writing the data and store a flag to tell itself to 
reorient the vectors on the fly during read, based on the polygon 
normal/tangent by using the reference frame (similar to the AttribReorient SOP 
in Houdini). 

So essentially, perform a reorient test on write for de-duplication, and do the 
reorient upon read.

Is this plausible?

Thanks,
Jason

Original issue reported on code.google.com by jdiver...@gmail.com on 14 Mar 2014 at 2:05

GoogleCodeExporter commented 9 years ago

Unfortunately this isn't how Alembic decides on how to share blocks of data.

While the behavior you desire could be accomplished in a custom schema, 
something that you might want to consider is the cost of the reorientation 
(computation cost of the algorithm plus reading the reference mesh) vs the cost 
of reading the normals sample.

Original comment by miller.lucas on 14 Mar 2014 at 6:37

Changed state: WontFix

GoogleCodeExporter commented 9 years ago

Hi Lucas,

Apologies for my ignorance on the nitty-gritty, I have no idea how Alembic is 
aware of duplicate/repeated data. Does it do this compression automatically or 
does the client software have to inform the API of when certain data is varying?

For some clarity, on a typical fairly complex character written out for 100 
frames:

360M character_w_vertexN.abc
96M  character_no_vertexN.abc

I'll go out on a limb here and suggest that reading ~270 MB (over a network in 
typical deployment) might be slower than, or a close contender to, recomputing 
the vertex N, and not to mention the obvious disk space savings - something 
perhaps in dollar terms is more valuable than the additional 
decompress/recompute CPU cycles these days.

Anyhow, thanks for your feedback,
Jason

Original comment by jdiver...@gmail.com on 14 Mar 2014 at 6:53

GoogleCodeExporter commented 9 years ago

Data is shared automatically.

With the extreme difference in disk size that is going to just the normals, I'm 
a little curious to know what your character is like.

Could you get by with a normal per point, or per face instead of face varying?

Original comment by miller.lucas on 14 Mar 2014 at 7:10

GoogleCodeExporter commented 9 years ago

So if the data is shared automatically, I assume there is some kind of test for 
equivalence between blocks, per frame? My idea here is to test for equivalence 
using the re-oriented attributes from the previous block/frame.

The character I'm exporting here is the Troll from Snow White:

https://www.google.com/search?q=troll+snow+white&client=firefox-a&hs=Qrv&rls=org
.mozilla:en-US:official&tbm=isch&tbo=u&source=univ&sa=X&ei=c3IjU6C6GYrMqAHQl4CwA
g&ved=0CC4QsAQ&biw=1193&bih=862

Normals per point don't fully model cusping, you need vertex normals to do it 
reliably. For many reasons, displacement shading being one important one, we 
don't like to split vertices / unique the edges of cusped geometry.  Granted 
much of the model doesn't require vertex normal fidelity, but there is no way 
to assign sparse attributes for certain vertices (like the cusped points) only, 
at least in any system I know of.

In our pipeline - I work at R+H by the way - with our "stream" format we 
dynamically compute these cusped normals based on smoothing group membership 
and supply that to the renderer, which would mean that we'd have to customize 
our alembic procedurals in the same way, or try to standardize on writing these 
attributes out always and use standard/shipped alembic support everywhere 
(Houdini/Maya/etc), which would be preferable, of course.

Original comment by jdiver...@gmail.com on 14 Mar 2014 at 9:34

GoogleCodeExporter commented 9 years ago

It's not an equivalence test on the data, instead a hash key is calculated on 
the sample of data and that is used to determine if that data has been 
previously written to the file.

Don't you have to reread the original mesh positions in order to reliably 
calculate the new normals?  (if not what data would you need to calculate the 
position of the new normals?)

Original comment by miller.lucas on 14 Mar 2014 at 10:55

GoogleCodeExporter commented 9 years ago

Ah, so you're comparing hash keys of the raw data? So perhaps this could be 
achieved by comparing hash keys of the _processed_ data?  (By processed, I mean 
doing an inverse transform of the normal attribute).

And yes, I suppose you'd need use the original mesh (which I'm sure you'd cache 
in memory) for all subsequent frames to calculate the new normals; which I 
admit might be another efficiency hit :)

Original comment by jdiver...@gmail.com on 14 Mar 2014 at 11:32

GoogleCodeExporter commented 9 years ago

It sounds like it might be cheaper to just read the normals in most situations 
than to have to read all of the extra data and compute the offset normals.

Original comment by miller.lucas on 14 Mar 2014 at 11:48

GoogleCodeExporter commented 9 years ago

Could be.. could be... we may never know.  I could attempt a simple jury-rigged 
Houdini setup to emulate this, if you're interested?

Original comment by jdiver...@gmail.com on 14 Mar 2014 at 11:51

GoogleCodeExporter commented 9 years ago

Having a few data points would be very interesting.

Original comment by miller.lucas on 14 Mar 2014 at 11:53

GoogleCodeExporter commented 9 years ago

Hi Lucas,

I fiddled about with a prototype of this idea in Houdini; reorienting the N 
attribute and checking md5 hashes of their values to make a decision to re-use 
the rest N.

Please take a look at the attached hip file, if you don't mind.

Btw, I used point attributes due to a blind spot in Houdini's python functions, 
and I should test this with the troll character to see if there is hugely 
noticeable speed impact over, say, 100 frames.

Original comment by jdiver...@gmail.com on 16 Mar 2014 at 2:11

Attachments:

reorientN.zip

GoogleCodeExporter commented 9 years ago

So with the Troll model, 100 frames out animated output (cached in RAM, so no 
read cost) - was about 8 seconds for 140k polys for the regular export method.

Doing the attribReorient and python hash cost about 15 secs more, but yields 
that much smaller file, the ratio from the post above. If the reorient 
operation was multithreaded, and we used a faster hash function, this might be 
more realistic. Not sure if it's ultimately worth it - some people might 
treasure the disk space and are wiling to suffer the read and write hit, not 
sure. Maybe it's just an idea for the heap?

Original comment by jdiver...@gmail.com on 16 Mar 2014 at 2:43

chashion / alembic

Deduplication of vertex normal attributes #337