A micro-services based project in rendering novel perspectives of input videos
utilizing neural radiance fields.
Learn more about NeRFs »
View Demo
·
Report Bug
·
Request Feature
This repository contains the backend for the (Neural Radiance Fields) NeRF-or-Nothing web application that takes raw user video and renders a novel realistic view of the scene they captured. Neural Radiance Fields are a new technique in novel view synthesis that has recently reached state of the art results.
NeRFs operate by first taking sets of input images taken at known locations and projecting rays from each input image via a pinhole camera model projection into 3D space. Assuming the input images are all capturing different perspectives of the same scene these reprojected rays will intersect in the center of the scene forming a field of light rays that produce the input images (these are the initial radiance fields). Then a small neural network is trained to predict the intensities and colors of light along this intersecting region in order to model the radiance fields that must have produced the initial images. This neural network is initialized randomly for each new scene and trained uniquely to model each captured scene. When the training is over a neural network is trained that can predict the color and intensity of a ray when polled at a specific angle and location in the scene. Using this trained neural network, raytracing can be used to poll the neural network along all the rays pointing towards a new virtual camera to take a picture from the scene at a perspective never seen before. Important to this project is the fact that the locations for each image are needed in order to train a NeRF, we get this data from running structure from motion (using COLMAP) on the input video. To learn more please visit the learning resources in the wiki.
Gaussian splatting is a novel approach to neural scene representation that offers significant improvements over traditional Neural Radiance Fields (NeRFs) in terms of rendering speed and visual quality. Like NeRFs, gaussian splatting starts with a set of input images capturing different perspectives of the same scene, along with their corresponding camera positions and orientations.
The key difference lies in how the scene is represented and rendered:
Scene Representation: Instead of using a neural network to model the entire scene, gaussian splatting represents the scene as a collection of 3D Gaussian primitives. Each Gaussian is defined by its position, covariance matrix (which determines its shape and orientation), and appearance attributes (color and opacity).
Initialization: The process begins by running structure from motion (using tools like COLMAP) on the input images to obtain initial camera parameters and a sparse point cloud. This point cloud is used to initialize the Gaussian primitives.
Training: The system then optimizes these Gaussians to best reproduce the input images. This involves adjusting the Gaussians' positions, shapes, and appearance attributes. The training process is typically faster than NeRF training and can be done end-to-end using gradient descent.
Rendering: To generate a new view, the Gaussians are projected onto the image plane of the virtual camera. Each Gaussian splat contributes to the final image based on its projected size, shape, and appearance. This process is highly parallelizable and can be efficiently implemented on GPUs, resulting in real-time or near-real-time rendering speeds.
View-dependent Effects: Gaussian splatting can model view-dependent effects by incorporating additional parameters for each Gaussian, allowing for realistic representation of specular highlights and reflections. If you want to take advantage of this, use .ply files, and for quick reflectionless rendering, use .splat files.
The resulting representation is compact, efficient to render, and capable of producing high-quality novel views. Importantly, like NeRFs, gaussian splatting requires accurate camera positions for the input images, which are typically obtained through structure from motion techniques.
Gaussian splatting offers several advantages over traditional NeRFs:
To learn more about gaussian splatting and its implementation details, please refer to the learning resources in the wiki.
Since running COLMAP and TensoRF takes upwards of 30 minutes per input video, this
project utilizes RabbitMQ to queue work orders for asynchronous workers to complete
user requests. MongoDb is used to keep track of active and past user jobs. The worker
implementations are under the NeRF
, and colmap
folders respectively while the
central webserver is under web-server
. For more information on how these components
communicate and how data is formatted see the READMEs within each of the
aforementioned folders.
The project should be be easy to install/run once you have completed the respective prerequisites.
The files ./docker-compose-go.yml
and docker-compose-flask.yml
handle the setup given that you want to run
V3 or V2 of the api, respectively.
Clone this repository
git clone https://github.com/NeRF-or-Nothing/backend.git
Compose the backend. View indepth [instructions]()
docker compose -f <chosen_compose_file>.yml up -d
Follow the frontend installation.
Once everything is running the website should be available at localhost:5173
and a video can
be uploaded to test the application.
Converting the training images from the nerf-synthetic dataset lego example to a video then running vidtonerf produces the following result:
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. Please go the the relevant repository and follow this process.
git checkout -b feature/AmazingFeature
)git commit -m 'Add some AmazingFeature'
)git push origin feature/AmazingFeature
)Distributed under the MIT License. See LICENSE
for more information.
Interested in the project?
Come join our discord server!
Or, inquire at: nerf@quicktechtime.com