Vargol / spatial-media

Specifications and tools for 360º video and spatial audio. Modified for Google's VR180
53 stars 11 forks source link

top-bottom mode not working? #4

Closed wonson closed 4 years ago

wonson commented 5 years ago

It results same as mono.

Vargol commented 5 years ago

Hi,

What settings are you using ?

I've tried injecting 360 and 180 equi-rectangular with top-bottom and both get the metadata I expect injected.

lucid_test_2_ou_360_injected.mp4 Processing: lucid_test_2_ou_360_injected_injected.mp4 Loaded file... Track 0 Stereo Mode: top-bottom Spherical Mode: equirectangular [Yaw: 0.00, Pitch: 0.00, Roll: 0.00] [Clip Top: 0, Bottom: 0, Left: 0 Right: 0]

lucid_test_2_ou_180_injected.mp4 Processing: lucid_test_2_ou_180_injected_injected.mp4 Loaded file... Track 0 Stereo Mode: top-bottom Spherical Mode: equirectangular [Yaw: 0.00, Pitch: 0.00, Roll: 0.00] [Clip Top: 0, Bottom: 0, Left: 1073741823 Right: 1073741823]

wonson commented 5 years ago

hi, i was injecting vr180, top-bottom, equirectanglar.

if I play it in VLC, it looks same as mono-injected VR video (I don't know whether the VLC does not support it well, it works well when playing LR video)

Then, I tried to upload it to google photo, it can be recognized as a VR video. However, it was played as it's never injected :( Google Photo should be working same as youtube video, and use same spec as far as I know.

Vargol commented 5 years ago

okay, let me knock up a quick video and give it a try.

Vargol commented 5 years ago

BTW are you using a clone of the repository, or one of the github releases ?

wonson commented 5 years ago

master clone

Vargol commented 5 years ago

HI.

I've tried a couple of command lines...

Davids-iMac:spatial-media $ python spatialmedia -i -m equi-mesh -s top-bottom -v 180x180 tb_180_3d_399_days.mp4 tb_180_3d_399_days_injected.mp4 Processing: tb_180_3d_399_days.mp4 Saved file settings Track 0 Stereo Mode: top-bottom Spherical Mode: equi-mesh Mesh Projection: [Mesh count 2] [Yaw: 0.00, Pitch: 0.00, Roll: 0.00] [Clip Top: 0, Bottom: 0, Left: 0 Right: 0] Track 1

spatial-media $ python spatialmedia -i -m equirectangular -s top-bottom -d 180 tb_180_3d_399_days.mp4 tb_180_3d_399_days_injected.mp4 Processing: tb_180_3d_399_days.mp4 Saved file settings Track 0 Stereo Mode: top-bottom Spherical Mode: equirectangular [Yaw: 0.00, Pitch: 0.00, Roll: 0.00] [Clip Top: 0, Bottom: 0, Left: 1073741823 Right: 1073741823] Track 1

The uploads to youtube see to be working fine... https://youtu.be/jw5meT50emI https://youtu.be/V9KNbRRtsGs and the original vid at https://youtu.be/aDPwyxAPonA

I tried uploading the video's to google photo but they but it doesn't seem to like them, maybe just slow to process. https://photos.app.goo.gl/4nuq8qySmuTWKSFU6

wonson commented 5 years ago

(Sigh) Maybe its google photo's fault then.

I don't think it's processing, because mine is already uploaded half day ago and it is already 1min long :(

wonson commented 5 years ago

May be we should both file a feedback in google photo about this issue.

wonson commented 5 years ago

@Vargol Hi, me again. I found that the st3d box differs from google's doc:

injected box: ...♪st3d....☻

google's doc: (https://github.com/google/spatial-media/blob/master/docs/spherical-video-v2-rfc.md) aligned(8) class Stereoscopic3D extends FullBox(‘st3d’, 0, 0) { unsigned int(8) stereo_mode; }

when I look into your code, i found that it treats the first 4 as version, and then 1 as stereo mode why is that?

Vargol commented 5 years ago

It's due to it being a subclass of FullBox

From the mp4 spec @ http://l.web.umkc.edu/lizhu/teaching/2016sp.video-communication/ref/mp4.pdf

aligned(8) class FullBox(unsigned int(32) boxtype, unsigned int(8) v, bit(24) f) extends Box(boxtype) { unsigned int(8) version = v; bit(24) flags = f; }

So the parent class Box is a 4 byte length (0x0000000D) and the 4 byte type* 'st3d' FullBox extends that with an 8 bit version and 24 bits worth of flags, 0x00 and 0x000000 Stereoscopic3D extends that with an 8 bit stereo mode, usually 0x01 or 0x02.

The whole thing gives you 13 bytes worth of data that is usually 0000000D737433640000000001 or 0000000D737433640000000002