Abantech / Efficio

1 stars 1 forks source link

360 stereoscopic image capture for VR applications #3

Open GMelencio opened 7 years ago

GMelencio commented 7 years ago

I am trying to implement an approach that allows 360 stereoscopic image capture of a space for use in VR applications.

While this article cites that it is impossible to do 360 stereoscopic capture, it does not take into account that a video is simply a collection of images.

My plan therefore is to use 2 x 360 cameras so that that performs a 360 sweep (turn) at certain points in the space, The space is mapped out such that there are only discrete points where the camera harness has to do a 360-degree sweep. Here is an image of the sample space I'll be mapping out:

image

The appoach I intend to implement is thus, there will be an invisible 3D model that represents the walls and other "collidable" items in the space such that a user "walking" around the space will not be able to go to "unmapped" areas. At the same time, the user has inverted spheres around each camera (one for left eye, one for right eye). This is described in the approach at this link.

The 'enhancement' I intend to make for this approach is that, for I will selectively show in the spheres the frame corresponding to the direction they are facing so that the stereo effect is preserved :)

How would I do this for a VR application?

First let me elaborate on the setup, referring back to the my diagram above, recall that I said that I will have two 360 degree cameras (side by side) do a "sweep". The speed at which the "sweep" will be taken will be constant for each point - for simplicity's sake, let's assume a full revolution of the "sweep" takes 360ms. With this, for the red points in the diagram, I will have 31 video snippets of 360ms each (not too big)

To further simplify, let's assume the line of the "A" points are aligned toward true north- such that if the user is at A1, looking at A2, he/she is facing North.

As far as the display goes, if the user is facing north while at point A1, he the "inverted spheres" enveloping each camera will show the frame at 1ms of the videos associated with point A1 (lets call this A1-LeftEye.mp4 and A1-RightEye.mp4). If the user, still standing at point A1, turns his/her head to look directly at B2, the frame of at he she will see the frame at 90ms of the videos for A1 (A1-LeftEye.mp4 and A1-RightEye.mp4). If the user moves to Point C3, and looks south (at C2), the user will then see the frame at 180ms of the videos associated with point C3 (C3-LeftEye.mp4 and C3-RightEye.mp4).

The point of doing this is because we want to preserve the stereo vision effect for the 360 degree view.

image

Now, what we just have to learn to do is be able to move between frames in videos depending on the camera orientation...

theo-armour commented 7 years ago

http://elevr.com/elevrant-panoramic-twist/

Um, no links, no references, no math, no demo = highly, naive and ignorant waffle.


Did you look at the links I sent on the 14th?

Do you mean something like the following? http://jaanga.github.io/equirec/equirec-image-flickr/equirec-image-flickr-r4.html http://jaanga.github.io/equirec//equirec-video/ricoh-theta-video-viewer-r1.html


I am trying to implement an approach that allows 360 stereoscopic image capture of a space for use in VR applications.

So you want to be a camera designer when you grow up?

I think it will be fun designing two side by side 360 cameras that do not obstruct each other

And, anyway, wtf does this have to do with Efficio - a middle app?


It's a lot easier to keep track of issues when they are all in the same repository and you sort them on latest update


Things to remember:


Please figure out a way that enables you to build demos that you can share.

Then we will have real things we can really talk about.

GMelencio commented 7 years ago

@theo-armour :

It seems there's a disconnect between what I am attempting versus what you think I'm trying to implement:

I want to get 360 stereoscopic video capture*. The links you sent me are merely 360 video capture OR are about generating a 3D map of the real world environment. I am well aware of SLAM (fwiw, I was at the NSF when they were collaborating with DARPA on the "automomous land vehicle challenge").

To clarify:

I think it will be fun designing two side by side 360 cameras that do not obstruct each other And, anyway, wtf does this have to do with Efficio - a middle app?

Efficio is all about taking in input signals and directing it to the appropriate output - this demo is an easy demo that showcases Efficio's capabilities beyond gesture recognition (recall that we don't just do gestures, we do voice, gaze tracking, and a whole host of other things that are ALL in the youtube videos that I've sent out multiple times). Given the device-agnostic nature of Efficio, achieving such a demo should be relatively low-hanging fruit.

OK so speficically how does Efficio play into this: please read the following carefully: Input device(s): the accelerometer/gyroscope/compass (or whatever) of the VR goggles and the sensors that tell the location of the user - if you're not familiar with [how the "lighthouse" sensors' interact with the vive there's the link.

NOTE: I'm not taking about the display, JUST the sensor(s) that tells where the user is in the room and where they are looking.

So this data about the user's location and orientation is fed to Efficio. and Efficio then selectively sends back the right frame for the user to see. I described in detail the paragraph which I shall repeat below for your benefit is what Efficio sends back to the VR display.

As far as the display goes, if the user is facing north while at point A1, then the "inverted spheres" enveloping each camera will show the frame at 1ms of the videos associated with point A1 (lets call this A1-LeftEye.mp4 and A1-RightEye.mp4). If the user, still standing at point A1, turns his/her head to look directly at B2, the frame of at he she will see the frame at 90ms of the videos for A1 (A1-LeftEye.mp4 and A1-RightEye.mp4). If the user moves to Point C3, and looks south (at C2), the user will then see the frame at 180ms of the videos associated with point C3 (C3-LeftEye.mp4 and C3-RightEye.mp4).

No one has done this before. In a way I am applying something like time-division-multiplexing by using a single video file to select the appropriate "frame" which in turn is the image displayed for the user - one for each eye to achieve stereoscopic vision of a real environment.

There's a zillion peeps - many with PhDs - working on 3D capture. And more patents than you can shake an Occulus at

Show me a demo where I can freely walk around, face wherever I want to face and all the time see stereoscopic 360 images of a real-world environment and I will stfu.

theo-armour commented 7 years ago

@GMelencio

Is this the beginning of a specification for what you are thinking about?

At build time, given:

Create and keep multiple 360 views of the space(s) in a grid-like sequence where the position of the camera is accurately determined. After each image is captured the camera moves to the next point in the grid pattern. Repeat until a sufficient collection of images has been gathered.

At run time:

The theoretical position of each eye in the captured space is determined - based on the positions in actual space. The most appropriate frames for each eye - from the entire range of frames - are streamed to the appropriate eye.

GMelencio commented 7 years ago

Yes!!! Precisely!

Greg Melencio CEO and Founder Abantech LLC 571-402-4688

On Thu, Sep 29, 2016 at 12:09 AM, Theo Armour notifications@github.com wrote:

@GMelencio https://github.com/GMelencio

Is this the beginning of a specification for what you are thinking about? At build time, given:

  • A static space - like a stage set
  • A single 360 degree camera

Create and keep multiple 360 views of the space(s) in a grid-like sequence where the position of the camera is accurately determined. After each image is captured the camera moves to the next point in the grid pattern. Repeat until a sufficient collection of images has been gathered. At run time:

The theoretical position of each eye in the captured space is determined - based on the positions in actual space. The most appropriate frames for each eye - from the entire range of frames - are streamed to the appropriate eye.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Abantech/Efficio/issues/3#issuecomment-250365835, or mute the thread https://github.com/notifications/unsubscribe-auth/AEfoMQt-I9k0OHH9MabXmrACWxG5D4DVks5quzoNgaJpZM4J9Opy .

theo-armour commented 7 years ago

@GMelencio

So, in order to satisfy good images for each eyeball, in what way would you calculate the number of images you would need, say, per square meter?

GMelencio commented 7 years ago

Well, there are 2 stereoscopic cameras. As shown in my diagram, there will be a "360 stereo capture" at every 2 feet on a grid.

Here's the video of me having mapped out the grid: https://www.youtube.com/watch?v=riNFE-9nL1E

The robot then rides on a rail with the 2 360 cameras suspended underneath, every 2 feet it will "see" the blue tape that marks where the 2-foot mark is, and perform the 360 sweep.

Here it is riding on the rail when I initially constructed it (before the cameras were attached), but you can see in the previous video I have the cameras attached now) https://www.youtube.com/watch?v=dAnC0iyb7vs

So to get the right frame, I just need to "cut" the video (using the audio cues emitted by the robot) and cut it into 360 frames (1 per degree of rotation).

Greg Melencio CEO and Founder Abantech LLC 571-402-4688

On Sun, Oct 2, 2016 at 4:49 AM, Theo Armour notifications@github.com wrote:

@GMelencio https://github.com/GMelencio

So, in order to satisfy good images for each eyeball, in what way would you calculate the number of images you would need, say, per square meter?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Abantech/Efficio/issues/3#issuecomment-250960860, or mute the thread https://github.com/notifications/unsubscribe-auth/AEfoMQ3zQJi8JvYIF-y5omOLir9vZqpuks5qv3AMgaJpZM4J9Opy .

theo-armour commented 7 years ago

In my message of 9/28 I said:

A single 360 degree camera

In your message of 9/30, you said:

Yes!!! Precisely!

In your message of 10/3, you said:

Well, there are 2 stereoscopic cameras

My comments:

Is this like a Gillette razor thing where there are more blades with each release? ;-)

360 degree cameras and stereoscopic cameras are very different devices.

And it seems to me that a 'stereoscopic' camera has two lenses by definition.

https://en.wikipedia.org/wiki/Stereo_camera

So do you actually mean one camera with two lenses or two cameras with a total of four lenses or something else entirely?

GMelencio commented 7 years ago

@Theo-Armour : you got me. I did catch that you said a single 360 was involved. However, being that the majority of the content was correct (i.e you understood the concept of me doing TDM to get the right frames). I chose not to belabor that point.

Refer to my original post. there were always meant to be 2x 360 cameras.

Also fwiw, here is a pic of the completed and mounted rig with the 2x 360 cameras.

Note I understand fully that using 2x 360 cameras is a bit overkill because I will only be using the position of their frames where they achieve stereoscopic vision. I wish I could somehow crop out the non-stereoscopic parts (where they obscure each other) but I'm trying to keep this simple and just capturing the video for now.

On Mon, Oct 3, 2016, 5:06 PM Theo Armour notifications@github.com wrote:

In my message of 9/28 I said:

A single 360 degree camera

In your message of 9/30, you said:

Yes!!! Precisely!

In your message of 10/3, you said:

Well, there are 2 stereoscopic cameras

My comments:

Is this like a Gillette razor thing where there are more blades with each release? ;-)

360 degree cameras and stereoscopic cameras are very different devices.

And it seems to me that a 'stereoscopic' camera has two lenses by definition.

https://en.wikipedia.org/wiki/Stereo_camera

So do you actually mean one camera with two lenses or two cameras with a total of four lenses or something else entirely?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Abantech/Efficio/issues/3#issuecomment-251227090, or mute the thread https://github.com/notifications/unsubscribe-auth/AEfoMT2JQSxYPJn5Yq6z3SvGsRbRsLx9ks5qwW5HgaJpZM4J9Opy .

theo-armour commented 7 years ago

no picture attached

theo-armour commented 7 years ago

image

Eyeballs - Locations of Stereoscopy R1

@GMelencio

Note I understand fully that using 2x 360 cameras is a bit overkill because I will only be using the position of their frames where they achieve stereoscopic vision.

Why not use just a single camera and make two passes. One pass in the position of the left eye and another pass in the position of the right eye?

A benefit of the one camera two passes idea is that there would be no second camera to obscure the view of the first camera

I will only be using the position of their frames where they achieve stereoscopic vision.

This is thought-provoking. The perception of depth is great when the eyes - or cameras - are facing straight out from the plane they reside in - and the viewing triangle is at its widest. But what happens to the images at the periphery when the the view triangle has little width and is almost a line?

Here's a little demo that tries to show some aspects of the view triangle as the eyes sweep from side to side.

Eyeballs - Locations of Stereoscopy R1

I feel quite certain that two images taken side by side would exhibit the same tendencies.

GMelencio commented 7 years ago

@Theo-Armour

Thanks very much for the input and the visuals that come alopng with it - you really go the extra mile!

Anyway, my responses to your questions:

Why not use just a single camera and make two passes. One pass in the position of the left eye and another pass in the position of the right eye?

That would be ideal, but the simples answer is ebcause I happen to have 2 cameras (c/o my friend) so I might as well use them. I thought about just using one camera but discovered other issues would arise by doing so: leveling/balance being one of them - having one camera means it has to be mounted in such a way that it's weight on one side of the contraption doesn't present an issue (i.e. must introduce a counterbalance. Second, synchronicity: to achieve realism, we want to have the stereoscopic capture happen at the same time as much as possible - this is why I use audio cues, my idea is that I can later use these audio signals to programatically synchronize the video feeds. See this demo video https://www.youtube.com/watch?v=wWMb5l9oPNIwhere I start the drone that does the capture with the dual 360 cameras. Synchonicyity is important to get it right at the right time of day and the environment - it ensures that the environemtn does not change between pass # 1 and pass # 2. It's one thing if they're different between locations (think google streetview) but its another thing if the left eye sees video taken at 8:00AM and the right eye sees video taken at 12:00PM (very likely in the case of hardware hicups).

A benefit of the one camera two passes idea is that there would be no second camera to obscure the view of the first camera

As it happens 360 cameras don't really capture the entire sphere - there's a part of the camera that is "blind". in the case of the 360 fly (the cameras I'm using) it's the base - see image below. So this is why I mounted the cameras in such a way that the other camera sits in the blind spot of the other. even though that's not really part of the the "stereo triangle", I just thought it useful to put the camera where it doesn't interfere with the visual range of the other (i.e. they're facing away from each other as seen in the video.

[image: tempFileForShare_2016-10-06-20-55-23.jpg]

Here's a little demo that tries to show some aspects of the view triangle as the eyes sweep from side to side. what happens to the images at the periphery when the the view triangle has little width and is almost a line?

Very cool demo! Note my approach however, I will always ONLY use the images that are at the apex of the stereoscopic capture. That is, where the distance between the two cameras and a subject that is along a plane that is perpendicular to the orientation of the two cameras forms an isoceles triangle - that is, the cameras are equidistant from the subjects.

[image: pasted1]

So what does that mean? Well, for each "area" (A1, A2, A3, B1, B2, B3 and so on in my diagram) I will have 360 frames extracted from from EACH camera's video to choose from - When the user moves his/her head left 5 degrees, they would get a different frame from when they are at 4 degrees. Efficio, will manage which frame to send to the 3D application. The application will be a a simple threejs or Unity app that basically wraps each camera ("left eye" camera and "right eye" camera) in a spherical skybox. What is displayed by the skybox will change based on the user's head orientation.

This is the part where I need help - do you think you can build a quick and dirty demo that simulates this? So I'm asking for a simple threejs app that has a skybox that changes based on some numerical input (preferably the direction the camera is facing relative to north). But at the very least a skybox that can display an array of images quickly and efficiently. Our target should be to have the ability to change the image of the skybox at no less than 30Hz. Do you think this is doable?

On Thu, Oct 6, 2016 at 2:14 AM Theo Armour notifications@github.com wrote:

[image: image] https://cloud.githubusercontent.com/assets/547626/19142305/4dd5aa22-8b51-11e6-8c02-3784be2f3e40.png

Eyeballs - Locations of Stereoscopy R1

@GMelencio https://github.com/GMelencio

Note I understand fully that using 2x 360 cameras is a bit overkill because I will only be using the position of their frames where they achieve stereoscopic vision.

Why not use just a single camera and make two passes. One pass in the position of the left eye and another pass in the position of the right eye?

A benefit of the one camera two passes idea is that there would be no second camera to obscure the view of the first camera

I will only be using the position of their frames where they achieve stereoscopic vision.

This is thought-provoking. The perception of depth is great when the eyes - or cameras - are facing straight out from the plane they reside in - and the viewing triangle is at its widest. But what happens to the images at the periphery when the the view triangle has little width and is almost a line?

Here's a little demo that tries to show some aspects of the view triangle as the eyes sweep from side to side.

Eyeballs - Locations of Stereoscopy R1 http://jaanga.github.io/cookbook-threejs/objects/eyeballs/eyeballs-locations-of-stereoscopy.html

I feel quite certain that two images taken side by side would exhibit the sane tendencies.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Abantech/Efficio/issues/3#issuecomment-251875972, or mute the thread https://github.com/notifications/unsubscribe-auth/AEfoMUg8JU6Ec8fkHHAAqpnRHYnkZ8i_ks5qxJHCgaJpZM4J9Opy .

theo-armour commented 7 years ago

@GMelencio

Please learn how to use an online issue tracking system in good ways.

You are including the email history in your posts. That sucks.

You also need to learn how to get images into a post.


Well, for each "area" (A1, A2, A3, B1, B2, B3 and so on in my diagram) I will have 360 frames extracted from from EACH camera's video to choose from

Assuming both cameras create equi-rectangular images that approach 360 x 180 degree at about 30 fps, can you kindly describe the deltas in the 360 frames you ask for? Write a pseudo-code for loop if that helps - including positions, rotations field of view and whatever.

If you wrote the pseudo-code in a document in a 'design' folder/repo/whatever then we can merge the edits we agree on.

do you think you can build a quick and dirty demo that simulates this?

Interesting phrasing. I certainly might describe a particular effort of mine using the 'quick and dirty' idiom, but I would never make a request to another person using such turn of phrase.

Our target should be to have the ability to change the image of the skybox at ? no less than 30Hz. Do you think this is doable?

Depends on the size/quality of the images. the amount of pre-loading that can be done and the streaming capabilities of the servers