KeyuWu-CS / MonoHair

Code of MonoHair: High-Fidelity Hair Modeling from a Monocular Video
Other
100 stars 4 forks source link

Proper parameter settings for a custom dataset #12

Open 0mil opened 3 months ago

0mil commented 3 months ago

@KeyuWu-CS Thank you for your previous response. I have more question regarding how to reproduce the quality of the results.

  1. From preprocessing to running the prepare_data.py script, there are many parameters that need to be set such as COLMAP, instant-NGP. In the case of using my own dataset, can you provide any advice on which parameters to adjust to achieve a similar quality to your results? (e.g., parameter settings for short male hair or long straight female hair)

  2. In addition, why should I train the instant-NGP for around 2-3 minutes and set the key frame in the front view during the preprocessing section? As far as I understand, there is already an instant-NGP training section in your 4-step 3D Hair Reconstruction process.

Thanks a lot in advance your answering! :)

KeyuWu-CS commented 3 months ago

In most case, I recommend using COLMAP.exe to estimate the camera parameters and set all images share the same focal length. When execute scripts/colmap2nerf.py I suggest set COLMAP=False. And when execute COLMAP, use as much views as possible. The quality limits on the quality of Instant-NGP. We need a good coarse initialization. Set key frame is used for generate 16 fixed views. We will use them to infer the inner structure. DeepMVSHair are trained within 16 fixed views. Where one is front view. We will align the key frame with the front view of the 16 views. Then we can get the rest 15 camera pose. We only training Instant-NGP once time, when set key frame, we just need to load the checkpoint.

0mil commented 3 months ago

@KeyuWu-CS Thanks so much for the reply! I am currently following your instructions to the letter. When running scripts/colmap2nerf.py, where do I set COLMAP=False? The scripts/colmap2nerf.py code doesn't contain a parameter named COLMAP; Are you perhaps referring to run_colmap? To be concise, are you saying to execute data-folder$ python [path-to-instant-ngp]/scripts/colmap2nerf.py --colmap_matcher exhaustive --aabb_scale 32 instead of data-folder$ python [path-to-instant-ngp]/scripts/colmap2nerf.py --colmap_matcher exhaustive --run_colmap --aabb_scale 32?

KeyuWu-CS commented 3 months ago

yes, it's run_colmap. But I set it to False. That is data-folder$ python [path-to-instant-ngp]/scripts/colmap2nerf.py --colmap_matcher exhaustive --run_colmap= --aabb_scale 32

0mil commented 3 months ago

@KeyuWu-CS Oh, I see. But something seems off. I am trying to run the script on Ubuntu, and when I execute data-folder$ python [path-to-instant-ngp]/scripts/colmap2nerf.py --colmap_matcher exhaustive --run_colmap= --aabb_scale 32, I get the following error:

(MonoHair) root@gpu-server:/MonoHair/data/test_keyuwu/colmap$
 python /MonoHair/submodules/instant-ngp/scripts/colmap2nerf.py --colmap_mat
cher exhaustive --run_colmap= --aabb_scale 32
usage: colmap2nerf.py [-h] [--video_in VIDEO_IN] [--video_fps VIDEO_FPS] [--time_slice TIME_SLICE]
                      [--run_colmap]
                      [--colmap_matcher {exhaustive,sequential,spatial,transitive,vocab_tree}]
                      [--colmap_db COLMAP_DB]
                      [--colmap_camera_model {SIMPLE_PINHOLE,PINHOLE,SIMPLE_RADIAL,RADIAL,OPENCV,SIMPLE_RADIAL_FISHEYE,RADIAL_FISHEYE,OPENCV_FISHEYE}]
                      [--colmap_camera_params COLMAP_CAMERA_PARAMS] [--images IMAGES] [--text TEXT]
                      [--aabb_scale {1,2,4,8,16,32,64,128}] [--skip_early SKIP_EARLY]
                      [--keep_colmap_coords] [--out OUT] [--vocab_path VOCAB_PATH] [--overwrite]
                      [--mask_categories [MASK_CATEGORIES ...]]
colmap2nerf.py: error: argument --run_colmap: ignored explicit argument ''

As far as I understand that in Python argparse, for arguments like --run_colmap which have action=store_true, not including the the option is set it to false. When I execute it this way, colmap is generated as follows. Is this correct?:

/my_data/colmap
├── colmap_text
│   ├── cameras.txt
│   ├── images.txt
│   └── points3D.txt
├── images
└── transforms.json
0mil commented 3 months ago

@KeyuWu-CS By setting run_colmap=false and running scripts/colmap2nerf.py, the structure of transforms.json in the resulting data is different as shown in the image below. (Left is the transform.json with run_colmap=false, and right is the transform.json with run_colmap=true)

image

In this case, when initializing instant-ngp and setting the viewpoint to proceed with the main 4-step process, you will encounter the following error at the prepare_data.py stage. Could I get any advice on this?

Using configuration file: configs/reconstruct/test_keyuwu.yaml
Process ID: 6707
setting configurations...
loading configs/reconstruct/base.yaml...
loading configs/reconstruct/test_keyuwu.yaml...
* HairGenerate:
   * connect_dot_threshold: 0.8
   * connect_scalp: True
   * connect_segments: True
   * connect_threshold: 0.0025
   * connect_to_guide: None
   * dist_to_root: 6
   * generate_segments: True
   * grow_threshold: 0.8
   * out_ratio: 0.1
* PMVO:
   * conf_threshold: 0.1
   * filter_point: True
   * genrate_ori_only: None
   * infer_inner: True
   * num_sample_per_grid: 4
   * optimize: True
   * patch_size: 5
   * threshold: 0.05
   * visible_threshold: 1
* bbox_min: [-0.32, -0.32, -0.24]
* bust_to_origin: [0.006, -1.644, 0.01]
* camera_path: camera/calib_data/wky07-22/cam_params.json
* check_strands: True
* cpu: None
* data:
   * Conf_path: conf
   * Occ3D_path: ours/Occ3D.mat
   * Ori2D_path: best_ori
   * Ori3D_path: ours/Ori3D.mat
   * bust_path: Bust/bust_long.obj
   * case: test_keyuwu
   * conf_threshold: 0.4
   * depth_path: render_depth
   * frame_interval: 4
   * image_size: [1920, 1080]
   * mask_path: hair_mask
   * raw_points_path: ours/colmap_points.obj
   * root: data
   * scalp_path: ours/scalp_tsfm.obj
   * strands_path: ours/world_str_raw.dat
* device: cuda:0
* gpu: 0
* image_camera_path: ours/cam_params.json
* infer_inner:
   * render_data: True
   * run_mvs: True
* name: 10-16
* ngp:
   * marching_cubes_density_thresh: 2.8
* output_root: output
* prepare_data:
   * fit_bust: True
   * process_bust: True
   * process_camera: True
   * process_imgs: True
   * render_depth: True
   * run_ngp: True
   * select_images: True
* save_path: refine
* scalp_diffusion: None
* seed: 0
* segment:
   * CDGNET_ckpt: assets/CDGNet/LIP_epoch_149.pth
   * MODNET_ckpt: assets/MODNet/modnet_photographic_portrait_matting.ckpt
   * scene_path: None
* vsize: 0.005
* yaml: configs/reconstruct/test_keyuwu
existing options file found (identical)
distance: 2.254131284488828
distance: 2.2541312844888277
Traceback (most recent call last):
  File "/MonoHair/prepare_data.py", line 86, in <module>
    generate_mvs_pose_from_base_cam(data_folder, select_files,camera_path, image_size=args.data.image_size)
  File "/MonoHair/Utils/ingp_utils.py", line 308, in generate_mvs_pose_from_base_cam
    xforms, fov = load_transofrm_json(data_folder + '/transforms.json')
  File "/MonoHair/Utils/ingp_utils.py", line 60, in load_transofrm_json
    camera_angle_y = data["camera_angle_y"]
KeyError: 'camera_angle_y'
KeyuWu-CS commented 3 months ago

I just test python E:\wukeyu\Instant-NGP\Instant-NGP-for-RTX-2000/scripts/colmap2nerf.py --colmap_matcher exhaustive --aabb_scale 4. The generated transform.json looks like this: 1723541395874. It has camera_angle_y.

0mil commented 3 months ago

@KeyuWu-CS I suppose that there is something I missed before running the scripts/colmap2nerf.py step. If there is something wrong in the following COLMAP process, please let me know.

  1. I installed the COMAL.bat for Windows.

  2. In the COLMAP.bat program, to obtain the model in .txt format, I ran the following steps 1) Feature Extraction, 2) Feature Matching, 3) Start Reconstruction with default configurations (The following image shows the COLMAP process for ksyusha1 dataset.)

    image
  3. I then exported the result as text and downloaded it to the colmap_text directory. After that, I placed it in the colmap folder and ran colmap2nerf.py.

Is there something I might be misunderstanding?

KeyuWu-CS commented 3 months ago

I think the step is corretly, I'm not sure if this is the version problem, I will paste my colmap2nerf.py here

#!/usr/bin/env python3

# Copyright (c) 2020-2022, NVIDIA CORPORATION.  All rights reserved.
#
# NVIDIA CORPORATION and its licensors retain all intellectual property
# and proprietary rights in and to this software, related documentation
# and any modifications thereto.  Any use, reproduction, disclosure or
# distribution of this software and related documentation without an express
# license agreement from NVIDIA CORPORATION is strictly prohibited.

import argparse
from glob import glob
import os
from pathlib import Path, PurePosixPath

import numpy as np
import json
import sys
import math
import cv2
import os
import shutil

ROOT_DIR = os.path.dirname(os.path.dirname(os.path.realpath(__file__)))
SCRIPTS_FOLDER = os.path.join(ROOT_DIR, "scripts")

def parse_args():
    parser = argparse.ArgumentParser(description="Convert a text colmap export to nerf format transforms.json; optionally convert video to images, and optionally run colmap in the first place.")

    parser.add_argument("--video_in", default="", help="Run ffmpeg first to convert a provided video file into a set of images. Uses the video_fps parameter also.")
    parser.add_argument("--video_fps", default=2)
    parser.add_argument("--time_slice", default="", help="Time (in seconds) in the format t1,t2 within which the images should be generated from the video. E.g.: \"--time_slice '10,300'\" will generate images only from 10th second to 300th second of the video.")
    parser.add_argument("--run_colmap", action="store_true", help="run colmap first on the image folder")
    parser.add_argument("--colmap_matcher", default="sequential", choices=["exhaustive","sequential","spatial","transitive","vocab_tree"], help="Select which matcher colmap should use. Sequential for videos, exhaustive for ad-hoc images.")
    parser.add_argument("--colmap_db", default="colmap.db", help="colmap database filename")
    parser.add_argument("--colmap_camera_model", default="OPENCV", choices=["SIMPLE_PINHOLE", "PINHOLE", "SIMPLE_RADIAL", "RADIAL", "OPENCV", "SIMPLE_RADIAL_FISHEYE", "RADIAL_FISHEYE", "OPENCV_FISHEYE"], help="Camera model")
    parser.add_argument("--colmap_camera_params", default="", help="Intrinsic parameters, depending on the chosen model. Format: fx,fy,cx,cy,dist")
    parser.add_argument("--images", default="images", help="Input path to the images.")
    parser.add_argument("--text", default="colmap_text", help="Input path to the colmap text files (set automatically if --run_colmap is used).")
    parser.add_argument("--aabb_scale", default=32, choices=["1", "2", "4", "8", "16", "32", "64", "128"], help="Large scene scale factor. 1=scene fits in unit cube; power of 2 up to 128")
    parser.add_argument("--skip_early", default=0, help="Skip this many images from the start.")
    parser.add_argument("--keep_colmap_coords", action="store_true", help="Keep transforms.json in COLMAP's original frame of reference (this will avoid reorienting and repositioning the scene for preview and rendering).")
    parser.add_argument("--out", default="transforms.json", help="Output path.")
    parser.add_argument("--vocab_path", default="", help="Vocabulary tree path.")
    parser.add_argument("--overwrite", action="store_true", help="Do not ask for confirmation for overwriting existing images and COLMAP data.")
    parser.add_argument("--mask_categories", nargs="*", type=str, default=[], help="Object categories that should be masked out from the training images. See `scripts/category2id.json` for supported categories.")
    args = parser.parse_args()
    return args

def do_system(arg):
    print(f"==== running: {arg}")
    err = os.system(arg)
    if err:
        print("FATAL: command failed")
        sys.exit(err)

def run_ffmpeg(args):
    ffmpeg_binary = "ffmpeg"

    # On Windows, if FFmpeg isn't found, try automatically downloading it from the internet
    if os.name == "nt" and os.system(f"where {ffmpeg_binary} >nul 2>nul") != 0:
        ffmpeg_glob = os.path.join(ROOT_DIR, "external", "ffmpeg", "*", "bin", "ffmpeg.exe")
        candidates = glob(ffmpeg_glob)
        if not candidates:
            print("FFmpeg not found. Attempting to download FFmpeg from the internet.")
            do_system(os.path.join(SCRIPTS_FOLDER, "download_ffmpeg.bat"))
            candidates = glob(ffmpeg_glob)

        if candidates:
            ffmpeg_binary = candidates[0]

    if not os.path.isabs(args.images):
        args.images = os.path.join(os.path.dirname(args.video_in), args.images)

    images = "\"" + args.images + "\""
    video =  "\"" + args.video_in + "\""
    fps = float(args.video_fps) or 1.0
    print(f"running ffmpeg with input video file={video}, output image folder={images}, fps={fps}.")
    if not args.overwrite and (input(f"warning! folder '{images}' will be deleted/replaced. continue? (Y/n)").lower().strip()+"y")[:1] != "y":
        sys.exit(1)
    try:
        # Passing Images' Path Without Double Quotes
        shutil.rmtree(args.images)
    except:
        pass
    do_system(f"mkdir {images}")

    time_slice_value = ""
    time_slice = args.time_slice
    if time_slice:
        start, end = time_slice.split(",")
        time_slice_value = f",select='between(t\,{start}\,{end})'"
    do_system(f"{ffmpeg_binary} -i {video} -qscale:v 1 -qmin 1 -vf \"fps={fps}{time_slice_value}\" {images}/%04d.jpg")

def run_colmap(args):
    colmap_binary = "colmap"

    # On Windows, if FFmpeg isn't found, try automatically downloading it from the internet
    if os.name == "nt" and os.system(f"where {colmap_binary} >nul 2>nul") != 0:
        colmap_glob = os.path.join(ROOT_DIR, "external", "colmap", "*", "COLMAP.bat")
        candidates = glob(colmap_glob)
        if not candidates:
            print("COLMAP not found. Attempting to download COLMAP from the internet.")
            do_system(os.path.join(SCRIPTS_FOLDER, "download_colmap.bat"))
            candidates = glob(colmap_glob)

        if candidates:
            colmap_binary = candidates[0]

    db = args.colmap_db
    images = "\"" + args.images + "\""
    db_noext=str(Path(db).with_suffix(""))

    if args.text=="text":
        args.text=db_noext+"_text"
    text=args.text
    sparse=db_noext+"_sparse"
    print(f"running colmap with:\n\tdb={db}\n\timages={images}\n\tsparse={sparse}\n\ttext={text}")
    if not args.overwrite and (input(f"warning! folders '{sparse}' and '{text}' will be deleted/replaced. continue? (Y/n)").lower().strip()+"y")[:1] != "y":
        sys.exit(1)
    if os.path.exists(db):
        os.remove(db)
    do_system(f"{colmap_binary} feature_extractor --ImageReader.camera_model {args.colmap_camera_model} --ImageReader.camera_params \"{args.colmap_camera_params}\" --SiftExtraction.estimate_affine_shape=true --SiftExtraction.domain_size_pooling=true --ImageReader.single_camera 1 --database_path {db} --image_path {images}")
    match_cmd = f"{colmap_binary} {args.colmap_matcher}_matcher --SiftMatching.guided_matching=true --database_path {db}"
    if args.vocab_path:
        match_cmd += f" --VocabTreeMatching.vocab_tree_path {args.vocab_path}"
    do_system(match_cmd)
    try:
        shutil.rmtree(sparse)
    except:
        pass
    do_system(f"mkdir {sparse}")
    do_system(f"{colmap_binary} mapper --database_path {db} --image_path {images} --output_path {sparse}")
    do_system(f"{colmap_binary} bundle_adjuster --input_path {sparse}/0 --output_path {sparse}/0 --BundleAdjustment.refine_principal_point 1")
    try:
        shutil.rmtree(text)
    except:
        pass
    do_system(f"mkdir {text}")
    do_system(f"{colmap_binary} model_converter --input_path {sparse}/0 --output_path {text} --output_type TXT")

def variance_of_laplacian(image):
    return cv2.Laplacian(image, cv2.CV_64F).var()

def sharpness(imagePath):
    image = cv2.imread(imagePath)
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    fm = variance_of_laplacian(gray)
    return fm

def qvec2rotmat(qvec):
    return np.array([
        [
            1 - 2 * qvec[2]**2 - 2 * qvec[3]**2,
            2 * qvec[1] * qvec[2] - 2 * qvec[0] * qvec[3],
            2 * qvec[3] * qvec[1] + 2 * qvec[0] * qvec[2]
        ], [
            2 * qvec[1] * qvec[2] + 2 * qvec[0] * qvec[3],
            1 - 2 * qvec[1]**2 - 2 * qvec[3]**2,
            2 * qvec[2] * qvec[3] - 2 * qvec[0] * qvec[1]
        ], [
            2 * qvec[3] * qvec[1] - 2 * qvec[0] * qvec[2],
            2 * qvec[2] * qvec[3] + 2 * qvec[0] * qvec[1],
            1 - 2 * qvec[1]**2 - 2 * qvec[2]**2
        ]
    ])

def rotmat(a, b):
    a, b = a / np.linalg.norm(a), b / np.linalg.norm(b)
    v = np.cross(a, b)
    c = np.dot(a, b)
    # handle exception for the opposite direction input
    if c < -1 + 1e-10:
        return rotmat(a + np.random.uniform(-1e-2, 1e-2, 3), b)
    s = np.linalg.norm(v)
    kmat = np.array([[0, -v[2], v[1]], [v[2], 0, -v[0]], [-v[1], v[0], 0]])
    return np.eye(3) + kmat + kmat.dot(kmat) * ((1 - c) / (s ** 2 + 1e-10))

def closest_point_2_lines(oa, da, ob, db): # returns point closest to both rays of form o+t*d, and a weight factor that goes to 0 if the lines are parallel
    da = da / np.linalg.norm(da)
    db = db / np.linalg.norm(db)
    c = np.cross(da, db)
    denom = np.linalg.norm(c)**2
    t = ob - oa
    ta = np.linalg.det([t, db, c]) / (denom + 1e-10)
    tb = np.linalg.det([t, da, c]) / (denom + 1e-10)
    if ta > 0:
        ta = 0
    if tb > 0:
        tb = 0
    return (oa+ta*da+ob+tb*db) * 0.5, denom

if __name__ == "__main__":
    args = parse_args()
    if args.video_in != "":
        run_ffmpeg(args)
    if args.run_colmap:
        run_colmap(args)
    AABB_SCALE = int(args.aabb_scale)
    SKIP_EARLY = int(args.skip_early)
    IMAGE_FOLDER = args.images
    TEXT_FOLDER = args.text
    OUT_PATH = args.out
    print(f"outputting to {OUT_PATH}...")
    with open(os.path.join(TEXT_FOLDER,"cameras.txt"), "r") as f:
        angle_x = math.pi / 2
        for line in f:
            # 1 SIMPLE_RADIAL 2048 1536 1580.46 1024 768 0.0045691
            # 1 OPENCV 3840 2160 3178.27 3182.09 1920 1080 0.159668 -0.231286 -0.00123982 0.00272224
            # 1 RADIAL 1920 1080 1665.1 960 540 0.0672856 -0.0761443
            if line[0] == "#":
                continue
            els = line.split(" ")
            w = float(els[2])
            h = float(els[3])
            fl_x = float(els[4])
            fl_y = float(els[4])
            k1 = 0
            k2 = 0
            k3 = 0
            k4 = 0
            p1 = 0
            p2 = 0
            cx = w / 2
            cy = h / 2
            is_fisheye = False
            if els[1] == "SIMPLE_PINHOLE":
                cx = float(els[5])
                cy = float(els[6])
            elif els[1] == "PINHOLE":
                fl_y = float(els[5])
                cx = float(els[6])
                cy = float(els[7])
            elif els[1] == "SIMPLE_RADIAL":
                cx = float(els[5])
                cy = float(els[6])
                k1 = float(els[7])
            elif els[1] == "RADIAL":
                cx = float(els[5])
                cy = float(els[6])
                k1 = float(els[7])
                k2 = float(els[8])
            elif els[1] == "OPENCV":
                fl_y = float(els[5])
                cx = float(els[6])
                cy = float(els[7])
                k1 = float(els[8])
                k2 = float(els[9])
                p1 = float(els[10])
                p2 = float(els[11])
            elif els[1] == "SIMPLE_RADIAL_FISHEYE":
                is_fisheye = True
                cx = float(els[5])
                cy = float(els[6])
                k1 = float(els[7])
            elif els[1] == "RADIAL_FISHEYE":
                is_fisheye = True
                cx = float(els[5])
                cy = float(els[6])
                k1 = float(els[7])
                k2 = float(els[8])
            elif els[1] == "OPENCV_FISHEYE":
                is_fisheye = True
                fl_y = float(els[5])
                cx = float(els[6])
                cy = float(els[7])
                k1 = float(els[8])
                k2 = float(els[9])
                k3 = float(els[10])
                k4 = float(els[11])
            else:
                print("Unknown camera model ", els[1])
            # fl = 0.5 * w / tan(0.5 * angle_x);
            angle_x = math.atan(w / (fl_x * 2)) * 2
            angle_y = math.atan(h / (fl_y * 2)) * 2
            fovx = angle_x * 180 / math.pi
            fovy = angle_y * 180 / math.pi

    print(f"camera:\n\tres={w,h}\n\tcenter={cx,cy}\n\tfocal={fl_x,fl_y}\n\tfov={fovx,fovy}\n\tk={k1,k2} p={p1,p2} ")

    with open(os.path.join(TEXT_FOLDER,"images.txt"), "r") as f:
        i = 0
        bottom = np.array([0.0, 0.0, 0.0, 1.0]).reshape([1, 4])
        out = {
            "camera_angle_x": angle_x,
            "camera_angle_y": angle_y,
            "fl_x": fl_x,
            "fl_y": fl_y,
            "k1": k1,
            "k2": k2,
            "k3": k3,
            "k4": k4,
            "p1": p1,
            "p2": p2,
            "is_fisheye": is_fisheye,
            "cx": cx,
            "cy": cy,
            "w": w,
            "h": h,
            "aabb_scale": AABB_SCALE,
            "frames": [],
        }

        up = np.zeros(3)
        for line in f:
            line = line.strip()
            if line[0] == "#":
                continue
            i = i + 1
            if i < SKIP_EARLY*2:
                continue
            if  i % 2 == 1:
                elems=line.split(" ") # 1-4 is quat, 5-7 is trans, 9ff is filename (9, if filename contains no spaces)
                #name = str(PurePosixPath(Path(IMAGE_FOLDER, elems[9])))
                # why is this requireing a relitive path while using ^
                image_rel = os.path.relpath(IMAGE_FOLDER)
                name = str(f"./{image_rel}/{'_'.join(elems[9:])}")
                b = sharpness(name)
                print(name, "sharpness=",b)
                image_id = int(elems[0])
                qvec = np.array(tuple(map(float, elems[1:5])))
                tvec = np.array(tuple(map(float, elems[5:8])))
                R = qvec2rotmat(-qvec)
                t = tvec.reshape([3,1])
                m = np.concatenate([np.concatenate([R, t], 1), bottom], 0)
                c2w = np.linalg.inv(m)
                if not args.keep_colmap_coords:
                    c2w[0:3,2] *= -1 # flip the y and z axis
                    c2w[0:3,1] *= -1
                    c2w = c2w[[1,0,2,3],:]
                    c2w[2,:] *= -1 # flip whole world upside down

                    up += c2w[0:3,1]

                frame = {"file_path":name,"sharpness":b,"transform_matrix": c2w}
                out["frames"].append(frame)
    nframes = len(out["frames"])

    if args.keep_colmap_coords:
        flip_mat = np.array([
            [1, 0, 0, 0],
            [0, -1, 0, 0],
            [0, 0, -1, 0],
            [0, 0, 0, 1]
        ])

        for f in out["frames"]:
            f["transform_matrix"] = np.matmul(f["transform_matrix"], flip_mat) # flip cameras (it just works)
    else:
        # don't keep colmap coords - reorient the scene to be easier to work with

        up = up / np.linalg.norm(up)
        print("up vector was", up)
        R = rotmat(up,[0,0,1]) # rotate up vector to [0,0,1]
        R = np.pad(R,[0,1])
        R[-1, -1] = 1

        for f in out["frames"]:
            f["transform_matrix"] = np.matmul(R, f["transform_matrix"]) # rotate up to be the z axis

        # find a central point they are all looking at
        print("computing center of attention...")
        totw = 0.0
        totp = np.array([0.0, 0.0, 0.0])
        for f in out["frames"]:
            mf = f["transform_matrix"][0:3,:]
            for g in out["frames"]:
                mg = g["transform_matrix"][0:3,:]
                p, w = closest_point_2_lines(mf[:,3], mf[:,2], mg[:,3], mg[:,2])
                if w > 0.00001:
                    totp += p*w
                    totw += w
        if totw > 0.0:
            totp /= totw
        print(totp) # the cameras are looking at totp
        for f in out["frames"]:
            f["transform_matrix"][0:3,3] -= totp

        avglen = 0.
        for f in out["frames"]:
            avglen += np.linalg.norm(f["transform_matrix"][0:3,3])
        avglen /= nframes
        print("avg camera distance from origin", avglen)
        for f in out["frames"]:
            f["transform_matrix"][0:3,3] *= 4.0 / avglen # scale to "nerf sized"

    for f in out["frames"]:
        f["transform_matrix"] = f["transform_matrix"].tolist()
    print(nframes,"frames")
    print(f"writing {OUT_PATH}")
    with open(OUT_PATH, "w") as outfile:
        json.dump(out, outfile, indent=2)

    if len(args.mask_categories) > 0:
        # Check if detectron2 is installed. If not, install it.
        try:
            import detectron2
        except ModuleNotFoundError:
            try:
                import torch
            except ModuleNotFoundError:
                print("PyTorch is not installed. For automatic masking, install PyTorch from https://pytorch.org/")
                sys.exit(1)

            input("Detectron2 is not installed. Press enter to install it.")
            import subprocess
            package = 'git+https://github.com/facebookresearch/detectron2.git'
            subprocess.check_call([sys.executable, "-m", "pip", "install", package])
            import detectron2

        import torch
        from pathlib import Path
        from detectron2.config import get_cfg
        from detectron2 import model_zoo
        from detectron2.engine import DefaultPredictor

        category2id = json.load(open(SCRIPTS_FOLDER / "category2id.json", "r"))
        mask_ids = [category2id[c] for c in args.mask_categories]

        cfg = get_cfg()
        # Add project-specific config (e.g., TensorMask) here if you're not running a model in detectron2's core library
        cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"))
        cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5  # set threshold for this model
        # Find a model from detectron2's model zoo.
        cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")
        predictor = DefaultPredictor(cfg)

        for frame in out['frames']:
            img = cv2.imread(frame['file_path'])
            outputs = predictor(img)

            output_mask = np.zeros((img.shape[0], img.shape[1]))
            for i in range(len(outputs['instances'])):
                if outputs['instances'][i].pred_classes.cpu().numpy()[0] in mask_ids:
                    pred_mask = outputs['instances'][i].pred_masks.cpu().numpy()[0]
                    output_mask = np.logical_or(output_mask, pred_mask)

            rgb_path = Path(frame['file_path'])
            mask_name = str(rgb_path.parents[0] / Path('dynamic_mask_' + rgb_path.name.replace('.jpg', '.png')))
            cv2.imwrite(mask_name, (output_mask*255).astype(np.uint8))
0mil commented 3 months ago

@KeyuWu-CS

Thank you for sharing the colmap2nerf.py script. Your guess was right! After reviewing the script, I identified an earlier version of the instant-ngp submodule that aligns with the provided colmap2nerf.py. If you don’t mind, I plan to create a PR that fixes the submodule's version to prevent the same issue from occurring again. I'll submit the PR shortly and link it to this Issue for your review.

Thanks again for your guidance!