NVlabs / Deep_Object_Pose

Deep Object Pose Estimation (DOPE) – ROS inference (CoRL 2018)
Other
1.03k stars 287 forks source link

Terrible results on inference #367

Open joansaurina opened 5 months ago

joansaurina commented 5 months ago

Hey. In order to undesrtand how DOPE works I did my first training with and YCB object which was already trained by the authors with the weights available.

I generated my data using blenderproc_data_gen: https://github.com/NVlabs/Deep_Object_Pose/tree/master/data_generation/blenderproc_data_gen

I used this command: python ../Deep_Object_Pose/data_generation/blenderproc_data_gen/run_blenderproc_datagen.py --nb_runs 1 --nb_frames 1250 --path_single_obj /joansaurina_working_dir/DOPE/objects/YOLO/006_mustard_bottle/006_mustard_bottle.obj --nb_objects 5 --distractors_folder /joansaurina_working_dir/DOPE/objects/distractors --nb_distractors 10 --backgrounds_folder /joansaurina_working_dir/DOPE/hdr_maps --outf /joansaurina_working_dir/DOPE/data --width 1920 --height 1080 --nb_workers 4 --run_id 0 --focal-length 1400 --scale 45

With the 2000 hdri background you suggested me, 19 different distractors, 5 times my object on each generated frame.

I run this command 16 times to have 20.000 frames.

Here you can see an example:

Image:

8_000768

JSON:

{
    "camera_data": {
        "width": 1920,
        "height": 1080,
        "camera_look_at": {
            "at": [
                -0.0,
                1.0,
                -7.549790126404332e-08
            ],
            "eye": [
                -0.0,
                25.0,
                -0.0
            ],
            "up": [
                1.0,
                0.0,
                0.0
            ]
        },
        "intrinsics": {
            "fx": 1400.0,
            "fy": 1400.0,
            "cx": 0.0,
            "cy": 0.0
        }
    },
    "objects": [
        {
            "class": "YOLO",
            "name": "YOLO_000",
            "visibility": 12686,
            "projected_cuboid": [
                [
                    1108.5007070466388,
                    336.1911830647591
                ],
                [
                    1078.1812421446348,
                    255.0891076541963
                ],
                [
                    1070.65097473424,
                    260.3795655918342
                ],
                [
                    1102.3030211876526,
                    344.74086262955416
                ],
                [
                    1271.915725354665,
                    274.8893983561234
                ],
                [
                    1242.479980673107,
                    190.12746977436785
                ],
                [
                    1241.7358770953597,
                    192.8215642437463
                ],
                [
                    1272.431572823686,
                    281.15140336160914
                ],
                [
                    1172.1530411267827,
                    267.6971772741837
                ]
            ]
        },
        {
            "class": "YOLO",
            "name": "YOLO_001",
            "visibility": 5529,
            "projected_cuboid": [
                [
                    748.1693890698944,
                    638.5536937565826
                ],
                [
                    745.0209861280417,
                    665.676889828529
                ],
                [
                    699.0399901702065,
                    676.1333755771077
                ],
                [
                    704.212205769928,
                    648.5309865606014
                ],
                [
                    779.6081026244624,
                    750.2393697928406
                ],
                [
                    777.958855780897,
                    781.4639214974293
                ],
                [
                    733.4724180476219,
                    791.6701335832453
                ],
                [
                    737.018795518966,
                    759.9888882095954
                ],
                [
                    740.8693019466269,
                    714.607689879471
                ]
            ]
        },
        {
            "class": "YOLO",
            "name": "YOLO_002",
            "visibility": 8625,
            "projected_cuboid": [
                [
                    1213.4024691517561,
                    585.3415755365863
                ],
                [
                    1264.7648127608948,
                    655.7706765991895
                ],
                [
                    1225.5796425497592,
                    675.4338844384231
                ],
                [
                    1173.6905775199327,
                    602.8048238146062
                ],
                [
                    1133.1093510845074,
                    667.7523397893364
                ],
                [
                    1178.6786526367275,
                    733.7684595626247
                ],
                [
                    1140.811816076352,
                    753.6014774967125
                ],
                [
                    1094.8718810168536,
                    685.6717368723343
                ],
                [
                    1176.3009197595618,
                    671.340277236763
                ]
            ]
        },
        {
            "class": "YOLO",
            "name": "YOLO_003",
            "visibility": 27234,
            "projected_cuboid": [
                [
                    1430.9540075880088,
                    864.5553795539274
                ],
                [
                    1305.2081599969474,
                    847.3324003244275
                ],
                [
                    1271.157892747688,
                    798.2218052184412
                ],
                [
                    1390.1255943181886,
                    815.3908448705647
                ],
                [
                    1345.8993641184693,
                    1084.8623923481816
                ],
                [
                    1225.4689264126082,
                    1073.044845804331
                ],
                [
                    1197.777335108545,
                    1013.6043049700863
                ],
                [
                    1311.9601373875057,
                    1025.8670266515817
                ],
                [
                    1308.8122056669117,
                    942.6296651322
                ]
            ]
        },
        {
            "class": "YOLO",
            "name": "YOLO_004",
            "visibility": 5668,
            "projected_cuboid": [
                [
                    1008.1704729786785,
                    497.9610101418672
                ],
                [
                    946.2807933359325,
                    493.9192684526973
                ],
                [
                    959.8096341396215,
                    494.87678731312127
                ],
                [
                    1020.0562785360796,
                    498.81225153944183
                ],
                [
                    999.611418222307,
                    626.0421877030922
                ],
                [
                    938.012328074337,
                    620.2197847797333
                ],
                [
                    951.7453982041766,
                    617.50125134809
                ],
                [
                    1011.7172969897706,
                    623.1145557441488
                ],
                [
                    979.3033981224396,
                    559.1178154569229
                ]
            ]
        }
    ]
}

I then proceed to train:

python -m torch.distributed.run ./Deep_Object_Pose/train/train.py --data /joansaurina_working_dir/DOPE/data/mustard/train --object mustard --namefile mustard --gpuids 2 --outf /joansaurina_working_dir/DOPE/outputs

I get this loss values: Train Epoch: 1 [0/21954 (0%)] Loss: 0.032226417213678 Local Rank: 0 Train Epoch: 1 [3200/21954 (15%)] Loss: 0.000054956450185 Local Rank: 0 Train Epoch: 1 [6400/21954 (29%)] Loss: 0.000005583947768 Local Rank: 0 Train Epoch: 1 [9600/21954 (44%)] Loss: 0.000002404336783 Local Rank: 0 Train Epoch: 1 [12800/21954 (58%)] Loss: 0.000001227574558 Local Rank: 0 Train Epoch: 1 [16000/21954 (73%)] Loss: 0.000000854489485 Local Rank: 0 Train Epoch: 1 [19200/21954 (87%)] Loss: 0.000000541734153 Local Rank: 0 Train Epoch: 2 [0/21954 (0%)] Loss: 0.000000461598574 Local Rank: 0 Train Epoch: 2 [3200/21954 (15%)] Loss: 0.000000814163343 Local Rank: 0 Train Epoch: 2 [6400/21954 (29%)] Loss: 0.000001324953246 Local Rank: 0 Train Epoch: 2 [9600/21954 (44%)] Loss: 0.000000222948032 Local Rank: 0 Train Epoch: 2 [12800/21954 (58%)] Loss: 0.000000372285456 Local Rank: 0 Train Epoch: 2 [16000/21954 (73%)] Loss: 0.000000558363013 Local Rank: 0 Train Epoch: 2 [19200/21954 (87%)] Loss: 0.000000236218071 Local Rank: 0 Train Epoch: 3 [0/21954 (0%)] Loss: 0.000001860549105 Local Rank: 0 Train Epoch: 3 [3200/21954 (15%)] Loss: 0.000000799212899 Local Rank: 0 Train Epoch: 3 [6400/21954 (29%)] Loss: 0.000000125715403 Local Rank: 0 Train Epoch: 3 [9600/21954 (44%)] Loss: 0.000000180747477 Local Rank: 0 Train Epoch: 3 [12800/21954 (58%)] Loss: 0.000000805953277 Local Rank: 0 Train Epoch: 3 [16000/21954 (73%)] Loss: 0.000001442992357 Local Rank: 0 Train Epoch: 3 [19200/21954 (87%)] Loss: 0.000001061197509 Local Rank: 0 Train Epoch: 4 [0/21954 (0%)] Loss: 0.000000982815891 Local Rank: 0 Train Epoch: 4 [3200/21954 (15%)] Loss: 0.000000206120603 Local Rank: 0 Train Epoch: 4 [6400/21954 (29%)] Loss: 0.000000127674738 Local Rank: 0 Train Epoch: 4 [9600/21954 (44%)] Loss: 0.000007593555438 Local Rank: 0 Train Epoch: 4 [12800/21954 (58%)] Loss: 0.000000755591827 Local Rank: 0 Train Epoch: 4 [16000/21954 (73%)] Loss: 0.000000437721752 Local Rank: 0 Train Epoch: 4 [19200/21954 (87%)] Loss: 0.000000293232802 Local Rank: 0 Train Epoch: 5 [0/21954 (0%)] Loss: 0.000000230907332 Local Rank: 0 Train Epoch: 5 [3200/21954 (15%)] Loss: 0.000000182625854 Local Rank: 0 Train Epoch: 5 [6400/21954 (29%)] Loss: 0.000000148762140 Local Rank: 0 Train Epoch: 5 [9600/21954 (44%)] Loss: 0.000000122769222 Local Rank: 0 Train Epoch: 5 [12800/21954 (58%)] Loss: 0.000000107598140 Local Rank: 0 Train Epoch: 5 [16000/21954 (73%)] Loss: 0.000000093323258 Local Rank: 0 Train Epoch: 5 [19200/21954 (87%)] Loss: 0.000000082392404 Local Rank: 0 Train Epoch: 6 [0/21954 (0%)] Loss: 0.000000075647705 Local Rank: 0 Train Epoch: 6 [3200/21954 (15%)] Loss: 0.000000064887971 Local Rank: 0 Train Epoch: 6 [6400/21954 (29%)] Loss: 0.000000060798961 Local Rank: 0 Train Epoch: 6 [9600/21954 (44%)] Loss: 0.000000054425744 Local Rank: 0 Train Epoch: 6 [12800/21954 (58%)] Loss: 0.000000050276164 Local Rank: 0 Train Epoch: 6 [16000/21954 (73%)] Loss: 0.000000046842217 Local Rank: 0 Train Epoch: 6 [19200/21954 (87%)] Loss: 0.000000041724803 Local Rank: 0 Train Epoch: 7 [0/21954 (0%)] Loss: 0.000000040864649 Local Rank: 0 Train Epoch: 7 [3200/21954 (15%)] Loss: 0.000000037598987 Local Rank: 0 Train Epoch: 7 [6400/21954 (29%)] Loss: 0.000000041400455 Local Rank: 0 Train Epoch: 7 [9600/21954 (44%)] Loss: 0.000000036804060 Local Rank: 0 Train Epoch: 7 [12800/21954 (58%)] Loss: 0.000000030431892 Local Rank: 0 Train Epoch: 7 [16000/21954 (73%)] Loss: 0.000000033451677 Local Rank: 0 Train Epoch: 7 [19200/21954 (87%)] Loss: 0.000000038179891 Local Rank: 0 Train Epoch: 8 [0/21954 (0%)] Loss: 0.000000137247866 Local Rank: 0 Train Epoch: 8 [3200/21954 (15%)] Loss: 0.000000024037080 Local Rank: 0 Train Epoch: 8 [6400/21954 (29%)] Loss: 0.000000040983824 Local Rank: 0 Train Epoch: 8 [9600/21954 (44%)] Loss: 0.000000023996826 Local Rank: 0 Train Epoch: 8 [12800/21954 (58%)] Loss: 0.000000025145226 Local Rank: 0 Train Epoch: 8 [16000/21954 (73%)] Loss: 0.000000057790899 Local Rank: 0 Train Epoch: 8 [19200/21954 (87%)] Loss: 0.000000078414146 Local Rank: 0 Train Epoch: 9 [0/21954 (0%)] Loss: 0.000000022625041 Local Rank: 0 Train Epoch: 9 [3200/21954 (15%)] Loss: 0.000000614076896 Local Rank: 0 Train Epoch: 9 [6400/21954 (29%)] Loss: 0.000000046579181 Local Rank: 0 Train Epoch: 9 [9600/21954 (44%)] Loss: 0.000000060488205 Local Rank: 0 Train Epoch: 9 [12800/21954 (58%)] Loss: 0.000000481881898 Local Rank: 0 Train Epoch: 9 [16000/21954 (73%)] Loss: 0.000000044869907 Local Rank: 0 Train Epoch: 9 [19200/21954 (87%)] Loss: 0.000000022641267 Local Rank: 0 Train Epoch: 10 [0/21954 (0%)] Loss: 0.000000040946539 Local Rank: 0 Train Epoch: 10 [3200/21954 (15%)] Loss: 0.000000019352248 Local Rank: 0 Train Epoch: 10 [6400/21954 (29%)] Loss: 0.000000093074377 Local Rank: 0 Train Epoch: 10 [9600/21954 (44%)] Loss: 0.000000191221389 Local Rank: 0 Train Epoch: 10 [12800/21954 (58%)] Loss: 0.000000054558832 Local Rank: 0 Train Epoch: 10 [16000/21954 (73%)] Loss: 0.000000014376070 Local Rank: 0 Train Epoch: 10 [19200/21954 (87%)] Loss: 0.000000157801111 Local Rank: 0 Train Epoch: 11 [0/21954 (0%)] Loss: 0.000000021281508 Local Rank: 0 Train Epoch: 11 [3200/21954 (15%)] Loss: 0.000000104082659 Local Rank: 0 Train Epoch: 11 [6400/21954 (29%)] Loss: 0.000000019813221 Local Rank: 0 Train Epoch: 11 [9600/21954 (44%)] Loss: 0.000000033571567 Local Rank: 0 Train Epoch: 11 [12800/21954 (58%)] Loss: 0.000000018261460 Local Rank: 0 Train Epoch: 11 [16000/21954 (73%)] Loss: 0.000000010644818 Local Rank: 0 Train Epoch: 11 [19200/21954 (87%)] Loss: 0.000000016095139 Local Rank: 0 Train Epoch: 12 [0/21954 (0%)] Loss: 0.000000011582900 Local Rank: 0 Train Epoch: 12 [3200/21954 (15%)] Loss: 0.000000432212261 Local Rank: 0 Train Epoch: 12 [6400/21954 (29%)] Loss: 0.000000479986852 Local Rank: 0 Train Epoch: 12 [9600/21954 (44%)] Loss: 0.000001059239480 Local Rank: 0 Train Epoch: 12 [12800/21954 (58%)] Loss: 0.000000036008135 Local Rank: 0 Train Epoch: 12 [16000/21954 (73%)] Loss: 0.000000030287978 Local Rank: 0 Train Epoch: 12 [19200/21954 (87%)] Loss: 0.000000200769165 Local Rank: 0 Train Epoch: 13 [0/21954 (0%)] Loss: 0.000000947358330 Local Rank: 0 Train Epoch: 13 [3200/21954 (15%)] Loss: 0.000000174554174 Local Rank: 0 Train Epoch: 13 [6400/21954 (29%)] Loss: 0.000000019825865 Local Rank: 0 Train Epoch: 13 [9600/21954 (44%)] Loss: 0.000000457419105 Local Rank: 0 Train Epoch: 13 [12800/21954 (58%)] Loss: 0.000000013466329 Local Rank: 0 Train Epoch: 13 [16000/21954 (73%)] Loss: 0.000000009109310 Local Rank: 0 Train Epoch: 13 [19200/21954 (87%)] Loss: 0.000000029324962 Local Rank: 0 Train Epoch: 14 [0/21954 (0%)] Loss: 0.000000016672255 Local Rank: 0 Train Epoch: 14 [3200/21954 (15%)] Loss: 0.000000008403266 Local Rank: 0 Train Epoch: 14 [6400/21954 (29%)] Loss: 0.000000013834858 Local Rank: 0 Train Epoch: 14 [9600/21954 (44%)] Loss: 0.000000041245951 Local Rank: 0 Train Epoch: 14 [12800/21954 (58%)] Loss: 0.000000126654840 Local Rank: 0 Train Epoch: 14 [16000/21954 (73%)] Loss: 0.000000068792460 Local Rank: 0 Train Epoch: 14 [19200/21954 (87%)] Loss: 0.000000023356789 Local Rank: 0 Train Epoch: 15 [0/21954 (0%)] Loss: 0.000000015392022 Local Rank: 0 Train Epoch: 15 [3200/21954 (15%)] Loss: 0.000000026289984 Local Rank: 0 Train Epoch: 15 [6400/21954 (29%)] Loss: 0.000000082555040 Local Rank: 0 Train Epoch: 15 [9600/21954 (44%)] Loss: 0.000000066952595 Local Rank: 0 Train Epoch: 15 [12800/21954 (58%)] Loss: 0.000000223104365 Local Rank: 0 Train Epoch: 15 [16000/21954 (73%)] Loss: 0.000000068002940 Local Rank: 0 Train Epoch: 15 [19200/21954 (87%)] Loss: 0.000000020225711 Local Rank: 0 Train Epoch: 16 [0/21954 (0%)] Loss: 0.000000064043562 Local Rank: 0 Train Epoch: 16 [3200/21954 (15%)] Loss: 0.000000237655058 Local Rank: 0 Train Epoch: 16 [6400/21954 (29%)] Loss: 0.000000117875743 Local Rank: 0 Train Epoch: 16 [9600/21954 (44%)] Loss: 0.000000038143284 Local Rank: 0 Train Epoch: 16 [12800/21954 (58%)] Loss: 0.000000048299949 Local Rank: 0 Train Epoch: 16 [16000/21954 (73%)] Loss: 0.000000017390846 Local Rank: 0 Train Epoch: 16 [19200/21954 (87%)] Loss: 0.000000030965509 Local Rank: 0 Train Epoch: 17 [0/21954 (0%)] Loss: 0.000000114598549 Local Rank: 0 Train Epoch: 17 [3200/21954 (15%)] Loss: 0.000000777719833 Local Rank: 0 Train Epoch: 17 [6400/21954 (29%)] Loss: 0.000000008485914 Local Rank: 0 Train Epoch: 17 [9600/21954 (44%)] Loss: 0.000000020658439 Local Rank: 0 Train Epoch: 17 [12800/21954 (58%)] Loss: 0.000000051189982 Local Rank: 0 Train Epoch: 17 [16000/21954 (73%)] Loss: 0.000000059993901 Local Rank: 0 Train Epoch: 17 [19200/21954 (87%)] Loss: 0.000000087362594 Local Rank: 0 Train Epoch: 18 [0/21954 (0%)] Loss: 0.000000038836877 Local Rank: 0 Train Epoch: 18 [3200/21954 (15%)] Loss: 0.000000061083057 Local Rank: 0 Train Epoch: 18 [6400/21954 (29%)] Loss: 0.000000156934902 Local Rank: 0 Train Epoch: 18 [9600/21954 (44%)] Loss: 0.000000038107849 Local Rank: 0 Train Epoch: 18 [12800/21954 (58%)] Loss: 0.000000015710810 Local Rank: 0 Train Epoch: 18 [16000/21954 (73%)] Loss: 0.000000010882834 Local Rank: 0 Train Epoch: 18 [19200/21954 (87%)] Loss: 0.000000012338301 Local Rank: 0 Train Epoch: 19 [0/21954 (0%)] Loss: 0.000000021178728 Local Rank: 0 Train Epoch: 19 [3200/21954 (15%)] Loss: 0.000000032514269 Local Rank: 0 Train Epoch: 19 [6400/21954 (29%)] Loss: 0.000000032664754 Local Rank: 0 Train Epoch: 19 [9600/21954 (44%)] Loss: 0.000000047819096 Local Rank: 0 Train Epoch: 19 [12800/21954 (58%)] Loss: 0.000000194597476 Local Rank: 0 Train Epoch: 19 [16000/21954 (73%)] Loss: 0.000000022240888 Local Rank: 0 Train Epoch: 19 [19200/21954 (87%)] Loss: 0.000000026454614 Local Rank: 0 Train Epoch: 20 [0/21954 (0%)] Loss: 0.000000037096356 Local Rank: 0 Train Epoch: 20 [3200/21954 (15%)] Loss: 0.000000066227003 Local Rank: 0 Train Epoch: 20 [6400/21954 (29%)] Loss: 0.000000025235991 Local Rank: 0 Train Epoch: 20 [9600/21954 (44%)] Loss: 0.000000065564294 Local Rank: 0 Train Epoch: 20 [12800/21954 (58%)] Loss: 0.000000042552507 Local Rank: 0 Train Epoch: 20 [16000/21954 (73%)] Loss: 0.000000014690205 Local Rank: 0 Train Epoch: 20 [19200/21954 (87%)] Loss: 0.000000019013266 Local Rank: 0 Train Epoch: 21 [0/21954 (0%)] Loss: 0.000000015840312 Local Rank: 0 Train Epoch: 21 [3200/21954 (15%)] Loss: 0.000000007192205 Local Rank: 0 Train Epoch: 21 [6400/21954 (29%)] Loss: 0.000000080767734 Local Rank: 0 Train Epoch: 21 [9600/21954 (44%)] Loss: 0.000000019668551 Local Rank: 0 Train Epoch: 21 [12800/21954 (58%)] Loss: 0.000000012353518 Local Rank: 0 Train Epoch: 21 [16000/21954 (73%)] Loss: 0.000000014589110 Local Rank: 0 Train Epoch: 21 [19200/21954 (87%)] Loss: 0.000000008434803 Local Rank: 0 Train Epoch: 22 [0/21954 (0%)] Loss: 0.000000010440480 Local Rank: 0 Train Epoch: 22 [3200/21954 (15%)] Loss: 0.000000069318745 Local Rank: 0 Train Epoch: 22 [6400/21954 (29%)] Loss: 0.000000055334809 Local Rank: 0 Train Epoch: 22 [9600/21954 (44%)] Loss: 0.000000020559881 Local Rank: 0 Train Epoch: 22 [12800/21954 (58%)] Loss: 0.000000099122367 Local Rank: 0 Train Epoch: 22 [16000/21954 (73%)] Loss: 0.000000011309816 Local Rank: 0 Train Epoch: 22 [19200/21954 (87%)] Loss: 0.000000025341269 Local Rank: 0 Train Epoch: 23 [0/21954 (0%)] Loss: 0.000000049346887 Local Rank: 0 Train Epoch: 23 [3200/21954 (15%)] Loss: 0.000000028185640 Local Rank: 0 Train Epoch: 23 [6400/21954 (29%)] Loss: 0.000000021219892 Local Rank: 0 Train Epoch: 23 [9600/21954 (44%)] Loss: 0.000000016983169 Local Rank: 0 Train Epoch: 23 [12800/21954 (58%)] Loss: 0.000000278465279 Local Rank: 0 Train Epoch: 23 [16000/21954 (73%)] Loss: 0.000000022677003 Local Rank: 0 Train Epoch: 23 [19200/21954 (87%)] Loss: 0.000000058087821 Local Rank: 0 Train Epoch: 24 [0/21954 (0%)] Loss: 0.000000029027028 Local Rank: 0 Train Epoch: 24 [3200/21954 (15%)] Loss: 0.000000019193587 Local Rank: 0 Train Epoch: 24 [6400/21954 (29%)] Loss: 0.000000028608230 Local Rank: 0 Train Epoch: 24 [9600/21954 (44%)] Loss: 0.000000011797447 Local Rank: 0 Train Epoch: 24 [12800/21954 (58%)] Loss: 0.000000013848087 Local Rank: 0 Train Epoch: 24 [16000/21954 (73%)] Loss: 0.000000011021527 Local Rank: 0 Train Epoch: 24 [19200/21954 (87%)] Loss: 0.000000058001294 Local Rank: 0 Train Epoch: 25 [0/21954 (0%)] Loss: 0.000000010887187 Local Rank: 0 Train Epoch: 25 [3200/21954 (15%)] Loss: 0.000000012451672 Local Rank: 0 Train Epoch: 25 [6400/21954 (29%)] Loss: 0.000000014544160 Local Rank: 0 Train Epoch: 25 [9600/21954 (44%)] Loss: 0.000000016954232 Local Rank: 0 Train Epoch: 25 [12800/21954 (58%)] Loss: 0.000000024964510 Local Rank: 0 Train Epoch: 25 [16000/21954 (73%)] Loss: 0.000000010586804 Local Rank: 0 Train Epoch: 25 [19200/21954 (87%)] Loss: 0.000000053758171 Local Rank: 0 Train Epoch: 26 [0/21954 (0%)] Loss: 0.000000010967806 Local Rank: 0 Train Epoch: 26 [3200/21954 (15%)] Loss: 0.000000022923089 Local Rank: 0 Train Epoch: 26 [6400/21954 (29%)] Loss: 0.000000030484355 Local Rank: 0 Train Epoch: 26 [9600/21954 (44%)] Loss: 0.000000030351149 Local Rank: 0 Train Epoch: 26 [12800/21954 (58%)] Loss: 0.000000021354811 Local Rank: 0 Train Epoch: 26 [16000/21954 (73%)] Loss: 0.000000021473376 Local Rank: 0 Train Epoch: 26 [19200/21954 (87%)] Loss: 0.000000019664878 Local Rank: 0 Train Epoch: 27 [0/21954 (0%)] Loss: 0.000000187000595 Local Rank: 0 Train Epoch: 27 [3200/21954 (15%)] Loss: 0.000000036314706 Local Rank: 0 Train Epoch: 27 [6400/21954 (29%)] Loss: 0.000000056437692 Local Rank: 0 Train Epoch: 27 [9600/21954 (44%)] Loss: 0.000000016312969 Local Rank: 0 Train Epoch: 27 [12800/21954 (58%)] Loss: 0.000000044095511 Local Rank: 0 Train Epoch: 27 [16000/21954 (73%)] Loss: 0.000000020743155 Local Rank: 0 Train Epoch: 27 [19200/21954 (87%)] Loss: 0.000000013474663 Local Rank: 0 Train Epoch: 28 [0/21954 (0%)] Loss: 0.000000044292761 Local Rank: 0 Train Epoch: 28 [3200/21954 (15%)] Loss: 0.000000109195184 Local Rank: 0 Train Epoch: 28 [6400/21954 (29%)] Loss: 0.000000016419662 Local Rank: 0 Train Epoch: 28 [9600/21954 (44%)] Loss: 0.000000009680882 Local Rank: 0 Train Epoch: 28 [12800/21954 (58%)] Loss: 0.000000013456868 Local Rank: 0 Train Epoch: 28 [16000/21954 (73%)] Loss: 0.000000066093151 Local Rank: 0 Train Epoch: 28 [19200/21954 (87%)] Loss: 0.000000007159178 Local Rank: 0 Train Epoch: 29 [0/21954 (0%)] Loss: 0.000000025287040 Local Rank: 0 Train Epoch: 29 [3200/21954 (15%)] Loss: 0.000000011249996 Local Rank: 0 Train Epoch: 29 [6400/21954 (29%)] Loss: 0.000000029144541 Local Rank: 0 Train Epoch: 29 [9600/21954 (44%)] Loss: 0.000000030823536 Local Rank: 0 Train Epoch: 29 [12800/21954 (58%)] Loss: 0.000000013946186 Local Rank: 0 Train Epoch: 29 [16000/21954 (73%)] Loss: 0.000000091640842 Local Rank: 0 Train Epoch: 29 [19200/21954 (87%)] Loss: 0.000000019633866 Local Rank: 0 Train Epoch: 30 [0/21954 (0%)] Loss: 0.000000021500888 Local Rank: 0 Train Epoch: 30 [3200/21954 (15%)] Loss: 0.000000022184068 Local Rank: 0 Train Epoch: 30 [6400/21954 (29%)] Loss: 0.000000157694643 Local Rank: 0 Train Epoch: 30 [9600/21954 (44%)] Loss: 0.000000026855130 Local Rank: 0 Train Epoch: 30 [12800/21954 (58%)] Loss: 0.000000009808918 Local Rank: 0 Train Epoch: 30 [16000/21954 (73%)] Loss: 0.000000041021014 Local Rank: 0 Train Epoch: 30 [19200/21954 (87%)] Loss: 0.000000020091651 Local Rank: 0 Train Epoch: 31 [0/21954 (0%)] Loss: 0.000000007249267 Local Rank: 0 Train Epoch: 31 [3200/21954 (15%)] Loss: 0.000000007327598 Local Rank: 0 Train Epoch: 31 [6400/21954 (29%)] Loss: 0.000000030033306 Local Rank: 0 Train Epoch: 31 [9600/21954 (44%)] Loss: 0.000000029361036 Local Rank: 0 Train Epoch: 31 [12800/21954 (58%)] Loss: 0.000000042759485 Local Rank: 0 Train Epoch: 31 [16000/21954 (73%)] Loss: 0.000000011636001 Local Rank: 0 Train Epoch: 31 [19200/21954 (87%)] Loss: 0.000000006599022 Local Rank: 0 Train Epoch: 32 [0/21954 (0%)] Loss: 0.000000049785978 Local Rank: 0 Train Epoch: 32 [3200/21954 (15%)] Loss: 0.000000006707165 Local Rank: 0 Train Epoch: 32 [6400/21954 (29%)] Loss: 0.000000007372277 Local Rank: 0 Train Epoch: 32 [9600/21954 (44%)] Loss: 0.000000014988961 Local Rank: 0 Train Epoch: 32 [12800/21954 (58%)] Loss: 0.000000011427483 Local Rank: 0 Train Epoch: 32 [16000/21954 (73%)] Loss: 0.000000022587560 Local Rank: 0 Train Epoch: 32 [19200/21954 (87%)] Loss: 0.000000014578165 Local Rank: 0 Train Epoch: 33 [0/21954 (0%)] Loss: 0.000000015279580 Local Rank: 0 Train Epoch: 33 [3200/21954 (15%)] Loss: 0.000000004317229 Local Rank: 0 Train Epoch: 33 [6400/21954 (29%)] Loss: 0.000000031787476 Local Rank: 0 Train Epoch: 33 [9600/21954 (44%)] Loss: 0.000000005673207 Local Rank: 0 Train Epoch: 33 [12800/21954 (58%)] Loss: 0.000000017157701 Local Rank: 0 Train Epoch: 33 [16000/21954 (73%)] Loss: 0.000000006732584 Local Rank: 0 Train Epoch: 33 [19200/21954 (87%)] Loss: 0.000000016443989 Local Rank: 0 Train Epoch: 34 [0/21954 (0%)] Loss: 0.000000012904260 Local Rank: 0 Train Epoch: 34 [3200/21954 (15%)] Loss: 0.000000009646755 Local Rank: 0 Train Epoch: 34 [6400/21954 (29%)] Loss: 0.000000017857362 Local Rank: 0 Train Epoch: 34 [9600/21954 (44%)] Loss: 0.000000029539651 Local Rank: 0 Train Epoch: 34 [12800/21954 (58%)] Loss: 0.000000019424224 Local Rank: 0 Train Epoch: 34 [16000/21954 (73%)] Loss: 0.000000018462277 Local Rank: 0 Train Epoch: 34 [19200/21954 (87%)] Loss: 0.000000019469015 Local Rank: 0 Train Epoch: 35 [0/21954 (0%)] Loss: 0.000000019323245 Local Rank: 0 Train Epoch: 35 [3200/21954 (15%)] Loss: 0.000000026136629 Local Rank: 0 Train Epoch: 35 [6400/21954 (29%)] Loss: 0.000000013433817 Local Rank: 0 Train Epoch: 35 [9600/21954 (44%)] Loss: 0.000000032893745 Local Rank: 0 Train Epoch: 35 [12800/21954 (58%)] Loss: 0.000000022598265 Local Rank: 0 Train Epoch: 35 [16000/21954 (73%)] Loss: 0.000000022211307 Local Rank: 0 Train Epoch: 35 [19200/21954 (87%)] Loss: 0.000000017635895 Local Rank: 0 Train Epoch: 36 [0/21954 (0%)] Loss: 0.000000039769763 Local Rank: 0 Train Epoch: 36 [3200/21954 (15%)] Loss: 0.000000011590498 Local Rank: 0 Train Epoch: 36 [6400/21954 (29%)] Loss: 0.000000008379004 Local Rank: 0 Train Epoch: 36 [9600/21954 (44%)] Loss: 0.000000009727076 Local Rank: 0 Train Epoch: 36 [12800/21954 (58%)] Loss: 0.000000015385467 Local Rank: 0 Train Epoch: 36 [16000/21954 (73%)] Loss: 0.000000008652863 Local Rank: 0 Train Epoch: 36 [19200/21954 (87%)] Loss: 0.000000023911392 Local Rank: 0 Train Epoch: 37 [0/21954 (0%)] Loss: 0.000000010615045 Local Rank: 0 Train Epoch: 37 [3200/21954 (15%)] Loss: 0.000000018407707 Local Rank: 0 Train Epoch: 37 [6400/21954 (29%)] Loss: 0.000000018355911 Local Rank: 0 Train Epoch: 37 [9600/21954 (44%)] Loss: 0.000000010970202 Local Rank: 0 Train Epoch: 37 [12800/21954 (58%)] Loss: 0.000000019862586 Local Rank: 0 Train Epoch: 37 [16000/21954 (73%)] Loss: 0.000000013529062 Local Rank: 0 Train Epoch: 37 [19200/21954 (87%)] Loss: 0.000000013295010 Local Rank: 0 Train Epoch: 38 [0/21954 (0%)] Loss: 0.000000013958021 Local Rank: 0 Train Epoch: 38 [3200/21954 (15%)] Loss: 0.000000014583954 Local Rank: 0 Train Epoch: 38 [6400/21954 (29%)] Loss: 0.000000012989634 Local Rank: 0 Train Epoch: 38 [9600/21954 (44%)] Loss: 0.000000011866096 Local Rank: 0 Train Epoch: 38 [12800/21954 (58%)] Loss: 0.000000023150763 Local Rank: 0 Train Epoch: 38 [16000/21954 (73%)] Loss: 0.000000007781955 Local Rank: 0 Train Epoch: 38 [19200/21954 (87%)] Loss: 0.000000015456870 Local Rank: 0 Train Epoch: 39 [0/21954 (0%)] Loss: 0.000000011201415 Local Rank: 0 Train Epoch: 39 [3200/21954 (15%)] Loss: 0.000000009827847 Local Rank: 0 Train Epoch: 39 [6400/21954 (29%)] Loss: 0.000000015327053 Local Rank: 0 Train Epoch: 39 [9600/21954 (44%)] Loss: 0.000000039688686 Local Rank: 0 Train Epoch: 39 [12800/21954 (58%)] Loss: 0.000000019237461 Local Rank: 0 Train Epoch: 39 [16000/21954 (73%)] Loss: 0.000000011011694 Local Rank: 0 Train Epoch: 39 [19200/21954 (87%)] Loss: 0.000000014264638 Local Rank: 0 Train Epoch: 40 [0/21954 (0%)] Loss: 0.000000030926468 Local Rank: 0 Train Epoch: 40 [3200/21954 (15%)] Loss: 0.000000007323429 Local Rank: 0 Train Epoch: 40 [6400/21954 (29%)] Loss: 0.000000006704386 Local Rank: 0 Train Epoch: 40 [9600/21954 (44%)] Loss: 0.000000009038962 Local Rank: 0 Train Epoch: 40 [12800/21954 (58%)] Loss: 0.000000009649733 Local Rank: 0 Train Epoch: 40 [16000/21954 (73%)] Loss: 0.000000009782566 Local Rank: 0 Train Epoch: 40 [19200/21954 (87%)] Loss: 0.000000016422479 Local Rank: 0 Train Epoch: 41 [0/21954 (0%)] Loss: 0.000000019963490 Local Rank: 0 Train Epoch: 41 [3200/21954 (15%)] Loss: 0.000000009757932 Local Rank: 0 Train Epoch: 41 [6400/21954 (29%)] Loss: 0.000000010693229 Local Rank: 0 Train Epoch: 41 [9600/21954 (44%)] Loss: 0.000000027821745 Local Rank: 0 Train Epoch: 41 [12800/21954 (58%)] Loss: 0.000000007995355 Local Rank: 0 Train Epoch: 41 [16000/21954 (73%)] Loss: 0.000000008349005 Local Rank: 0 Train Epoch: 41 [19200/21954 (87%)] Loss: 0.000000005064072 Local Rank: 0 Train Epoch: 42 [0/21954 (0%)] Loss: 0.000000026273964 Local Rank: 0 Train Epoch: 42 [3200/21954 (15%)] Loss: 0.000000006836946 Local Rank: 0 Train Epoch: 42 [6400/21954 (29%)] Loss: 0.000000006001378 Local Rank: 0 Train Epoch: 42 [9600/21954 (44%)] Loss: 0.000000006310722 Local Rank: 0 Train Epoch: 42 [12800/21954 (58%)] Loss: 0.000000013901271 Local Rank: 0 Train Epoch: 42 [16000/21954 (73%)] Loss: 0.000000011100768 Local Rank: 0 Train Epoch: 42 [19200/21954 (87%)] Loss: 0.000000007251665 Local Rank: 0 Train Epoch: 43 [0/21954 (0%)] Loss: 0.000000012155632 Local Rank: 0 Train Epoch: 43 [3200/21954 (15%)] Loss: 0.000000010087867 Local Rank: 0 Train Epoch: 43 [6400/21954 (29%)] Loss: 0.000000022912102 Local Rank: 0 Train Epoch: 43 [9600/21954 (44%)] Loss: 0.000000006398962 Local Rank: 0 Train Epoch: 43 [12800/21954 (58%)] Loss: 0.000000007570744 Local Rank: 0 Train Epoch: 43 [16000/21954 (73%)] Loss: 0.000000028547678 Local Rank: 0 Train Epoch: 43 [19200/21954 (87%)] Loss: 0.000000014391338 Local Rank: 0 Train Epoch: 44 [0/21954 (0%)] Loss: 0.000000006731553 Local Rank: 0 Train Epoch: 44 [3200/21954 (15%)] Loss: 0.000000009906681 Local Rank: 0 Train Epoch: 44 [6400/21954 (29%)] Loss: 0.000000015218992 Local Rank: 0 Train Epoch: 44 [9600/21954 (44%)] Loss: 0.000000020577733 Local Rank: 0 Train Epoch: 44 [12800/21954 (58%)] Loss: 0.000000008413754 Local Rank: 0 Train Epoch: 44 [16000/21954 (73%)] Loss: 0.000000006593781 Local Rank: 0 Train Epoch: 44 [19200/21954 (87%)] Loss: 0.000000004786202 Local Rank: 0 Train Epoch: 45 [0/21954 (0%)] Loss: 0.000000005672906 Local Rank: 0 Train Epoch: 45 [3200/21954 (15%)] Loss: 0.000000008732899 Local Rank: 0 Train Epoch: 45 [6400/21954 (29%)] Loss: 0.000000013123611 Local Rank: 0 Train Epoch: 45 [9600/21954 (44%)] Loss: 0.000000028685154 Local Rank: 0 Train Epoch: 45 [12800/21954 (58%)] Loss: 0.000000017706125 Local Rank: 0 Train Epoch: 45 [16000/21954 (73%)] Loss: 0.000000027039253 Local Rank: 0 Train Epoch: 45 [19200/21954 (87%)] Loss: 0.000000009853413 Local Rank: 0 Train Epoch: 46 [0/21954 (0%)] Loss: 0.000000008321767 Local Rank: 0 Train Epoch: 46 [3200/21954 (15%)] Loss: 0.000000008611090 Local Rank: 0 Train Epoch: 46 [6400/21954 (29%)] Loss: 0.000000008987664 Local Rank: 0 Train Epoch: 46 [9600/21954 (44%)] Loss: 0.000000011627986 Local Rank: 0 Train Epoch: 46 [12800/21954 (58%)] Loss: 0.000000017624304 Local Rank: 0 Train Epoch: 46 [16000/21954 (73%)] Loss: 0.000000007787125 Local Rank: 0 Train Epoch: 46 [19200/21954 (87%)] Loss: 0.000000009117018 Local Rank: 0 Train Epoch: 47 [0/21954 (0%)] Loss: 0.000000020352275 Local Rank: 0 Train Epoch: 47 [3200/21954 (15%)] Loss: 0.000000008937186 Local Rank: 0 Train Epoch: 47 [6400/21954 (29%)] Loss: 0.000000011379610 Local Rank: 0 Train Epoch: 47 [9600/21954 (44%)] Loss: 0.000000010561430 Local Rank: 0 Train Epoch: 47 [12800/21954 (58%)] Loss: 0.000000014728766 Local Rank: 0 Train Epoch: 47 [16000/21954 (73%)] Loss: 0.000000008338232 Local Rank: 0 Train Epoch: 47 [19200/21954 (87%)] Loss: 0.000000006868773 Local Rank: 0 Train Epoch: 48 [0/21954 (0%)] Loss: 0.000000006604123 Local Rank: 0 Train Epoch: 48 [3200/21954 (15%)] Loss: 0.000000009185202 Local Rank: 0 Train Epoch: 48 [6400/21954 (29%)] Loss: 0.000000004450263 Local Rank: 0 Train Epoch: 48 [9600/21954 (44%)] Loss: 0.000000006833139 Local Rank: 0 Train Epoch: 48 [12800/21954 (58%)] Loss: 0.000000018508160 Local Rank: 0 Train Epoch: 48 [16000/21954 (73%)] Loss: 0.000000016448059 Local Rank: 0 Train Epoch: 48 [19200/21954 (87%)] Loss: 0.000000006402178 Local Rank: 0 Train Epoch: 49 [0/21954 (0%)] Loss: 0.000000008051963 Local Rank: 0 Train Epoch: 49 [3200/21954 (15%)] Loss: 0.000000013908252 Local Rank: 0 Train Epoch: 49 [6400/21954 (29%)] Loss: 0.000000008676697 Local Rank: 0 Train Epoch: 49 [9600/21954 (44%)] Loss: 0.000000009819944 Local Rank: 0 Train Epoch: 49 [12800/21954 (58%)] Loss: 0.000000007234369 Local Rank: 0 Train Epoch: 49 [16000/21954 (73%)] Loss: 0.000000041287961 Local Rank: 0 Train Epoch: 49 [19200/21954 (87%)] Loss: 0.000000005146559 Local Rank: 0 Train Epoch: 50 [0/21954 (0%)] Loss: 0.000000015207702 Local Rank: 0 Train Epoch: 50 [3200/21954 (15%)] Loss: 0.000000010120303 Local Rank: 0 Train Epoch: 50 [6400/21954 (29%)] Loss: 0.000000005482596 Local Rank: 0 Train Epoch: 50 [9600/21954 (44%)] Loss: 0.000000012620689 Local Rank: 0 Train Epoch: 50 [12800/21954 (58%)] Loss: 0.000000012743361 Local Rank: 0 Train Epoch: 50 [16000/21954 (73%)] Loss: 0.000000012274777 Local Rank: 0 Train Epoch: 50 [19200/21954 (87%)] Loss: 0.000000008007879 Local Rank: 0 Train Epoch: 51 [0/21954 (0%)] Loss: 0.000000006334987 Local Rank: 0 Train Epoch: 51 [3200/21954 (15%)] Loss: 0.000000010136299 Local Rank: 0 Train Epoch: 51 [6400/21954 (29%)] Loss: 0.000000003553023 Local Rank: 0 Train Epoch: 51 [9600/21954 (44%)] Loss: 0.000000010449689 Local Rank: 0 Train Epoch: 51 [12800/21954 (58%)] Loss: 0.000000010018343 Local Rank: 0 Train Epoch: 51 [16000/21954 (73%)] Loss: 0.000000014135829 Local Rank: 0 Train Epoch: 51 [19200/21954 (87%)] Loss: 0.000000008712250 Local Rank: 0 Train Epoch: 52 [0/21954 (0%)] Loss: 0.000000010065902 Local Rank: 0 Train Epoch: 52 [3200/21954 (15%)] Loss: 0.000000006660399 Local Rank: 0 Train Epoch: 52 [6400/21954 (29%)] Loss: 0.000000006505513 Local Rank: 0 Train Epoch: 52 [9600/21954 (44%)] Loss: 0.000000009913525 Local Rank: 0 Train Epoch: 52 [12800/21954 (58%)] Loss: 0.000000009257180 Local Rank: 0 Train Epoch: 52 [16000/21954 (73%)] Loss: 0.000000010755775 Local Rank: 0 Train Epoch: 52 [19200/21954 (87%)] Loss: 0.000000006325999 Local Rank: 0 Train Epoch: 53 [0/21954 (0%)] Loss: 0.000000035536004 Local Rank: 0 Train Epoch: 53 [3200/21954 (15%)] Loss: 0.000000021122503 Local Rank: 0 Train Epoch: 53 [6400/21954 (29%)] Loss: 0.000000007899352 Local Rank: 0 Train Epoch: 53 [9600/21954 (44%)] Loss: 0.000000007886018 Local Rank: 0 Train Epoch: 53 [12800/21954 (58%)] Loss: 0.000000007920454 Local Rank: 0 Train Epoch: 53 [16000/21954 (73%)] Loss: 0.000000014844571 Local Rank: 0 Train Epoch: 53 [19200/21954 (87%)] Loss: 0.000000008930600 Local Rank: 0 Train Epoch: 54 [0/21954 (0%)] Loss: 0.000000005660567 Local Rank: 0 Train Epoch: 54 [3200/21954 (15%)] Loss: 0.000000006730198 Local Rank: 0 Train Epoch: 54 [6400/21954 (29%)] Loss: 0.000000007977213 Local Rank: 0 Train Epoch: 54 [9600/21954 (44%)] Loss: 0.000000008456022 Local Rank: 0 Train Epoch: 54 [12800/21954 (58%)] Loss: 0.000000013306240 Local Rank: 0 Train Epoch: 54 [16000/21954 (73%)] Loss: 0.000000006157055 Local Rank: 0 Train Epoch: 54 [19200/21954 (87%)] Loss: 0.000000010907463 Local Rank: 0 Train Epoch: 55 [0/21954 (0%)] Loss: 0.000000016609061 Local Rank: 0 Train Epoch: 55 [3200/21954 (15%)] Loss: 0.000000007411313 Local Rank: 0 Train Epoch: 55 [6400/21954 (29%)] Loss: 0.000000006454592 Local Rank: 0 Train Epoch: 55 [9600/21954 (44%)] Loss: 0.000000006518717 Local Rank: 0 Train Epoch: 55 [12800/21954 (58%)] Loss: 0.000000009766346 Local Rank: 0 Train Epoch: 55 [16000/21954 (73%)] Loss: 0.000000005829686 Local Rank: 0 Train Epoch: 55 [19200/21954 (87%)] Loss: 0.000000012888748 Local Rank: 0 Train Epoch: 56 [0/21954 (0%)] Loss: 0.000000010045416 Local Rank: 0 Train Epoch: 56 [3200/21954 (15%)] Loss: 0.000000013699600 Local Rank: 0 Train Epoch: 56 [6400/21954 (29%)] Loss: 0.000000006701123 Local Rank: 0 Train Epoch: 56 [9600/21954 (44%)] Loss: 0.000000010160383 Local Rank: 0 Train Epoch: 56 [12800/21954 (58%)] Loss: 0.000000009344611 Local Rank: 0 Train Epoch: 56 [16000/21954 (73%)] Loss: 0.000000018317909 Local Rank: 0 Train Epoch: 56 [19200/21954 (87%)] Loss: 0.000000011199674 Local Rank: 0 Train Epoch: 57 [0/21954 (0%)] Loss: 0.000000010277432 Local Rank: 0 Train Epoch: 57 [3200/21954 (15%)] Loss: 0.000000012257399 Local Rank: 0 Train Epoch: 57 [6400/21954 (29%)] Loss: 0.000000007362419 Local Rank: 0 Train Epoch: 57 [9600/21954 (44%)] Loss: 0.000000011940708 Local Rank: 0 Train Epoch: 57 [12800/21954 (58%)] Loss: 0.000000005524560 Local Rank: 0 Train Epoch: 57 [16000/21954 (73%)] Loss: 0.000000006665691 Local Rank: 0 Train Epoch: 57 [19200/21954 (87%)] Loss: 0.000000013095502 Local Rank: 0 Train Epoch: 58 [0/21954 (0%)] Loss: 0.000000006900216 Local Rank: 0 Train Epoch: 58 [3200/21954 (15%)] Loss: 0.000000005206885 Local Rank: 0 Train Epoch: 58 [6400/21954 (29%)] Loss: 0.000000004738580 Local Rank: 0 Train Epoch: 58 [9600/21954 (44%)] Loss: 0.000000012049407 Local Rank: 0 Train Epoch: 58 [12800/21954 (58%)] Loss: 0.000000004357003 Local Rank: 0 Train Epoch: 58 [16000/21954 (73%)] Loss: 0.000000008204098 Local Rank: 0 Train Epoch: 58 [19200/21954 (87%)] Loss: 0.000000008960082 Local Rank: 0 Train Epoch: 59 [0/21954 (0%)] Loss: 0.000000021830797 Local Rank: 0 Train Epoch: 59 [3200/21954 (15%)] Loss: 0.000000009546346 Local Rank: 0 Train Epoch: 59 [6400/21954 (29%)] Loss: 0.000000006347834 Local Rank: 0 Train Epoch: 59 [9600/21954 (44%)] Loss: 0.000000008012685 Local Rank: 0 Train Epoch: 59 [12800/21954 (58%)] Loss: 0.000000005179124 Local Rank: 0 Train Epoch: 59 [16000/21954 (73%)] Loss: 0.000000014042721 Local Rank: 0 Train Epoch: 59 [19200/21954 (87%)] Loss: 0.000000008878821 Local Rank: 0 Train Epoch: 60 [0/21954 (0%)] Loss: 0.000000004442226 Local Rank: 0 Train Epoch: 60 [3200/21954 (15%)] Loss: 0.000000009028856 Local Rank: 0 Train Epoch: 60 [6400/21954 (29%)] Loss: 0.000000007277878 Local Rank: 0 Train Epoch: 60 [9600/21954 (44%)] Loss: 0.000000006779598 Local Rank: 0 Train Epoch: 60 [12800/21954 (58%)] Loss: 0.000000008119192 Local Rank: 0 Train Epoch: 60 [16000/21954 (73%)] Loss: 0.000000025056913 Local Rank: 0 Train Epoch: 60 [19200/21954 (87%)] Loss: 0.000000011838531 Local Rank: 0

After that, I run inference:

python ./Deep_Object_Pose/inference/inference.py --weights ./weights/mustard_60.pth --data ./weights --object mustard --exts png --outf ./weights --config config_pose.yaml --camera camera_info.yaml

And I get empty camera and non objects detected in my test images and in every image I try.

I check my belief maps and they are very bad:

image

In the original weights you provided I could see the following:

image

I would appreciate some help,

Thanks

Joan

mintar commented 5 months ago

You specified --object mustard, but your json has "class": "YOLO". Try --object YOLO instead.

In short, your dataset doesn't contain a single object of class mustard, that's why the training didn't work.

You can also see it in the loss value. If the loss immediately drops under 0.0001, you know something's wrong. :)

Train Epoch: 1 [0/21954 (0%)] Loss: 0.032226417213678 Local Rank: 0
Train Epoch: 1 [3200/21954 (15%)] Loss: 0.000054956450185 Local Rank: 0
joansaurina commented 5 months ago

Hey @mintar thanks your help. After this debugging I can finally see some decent results but not the ones I could expect:

17_000351

On the test images it only detects (2/4) objects (1 really hidden) and the points are not very accurate.

As for the training loss, it does no seem no get down:

Train Epoch: 2 [3200/21954 (15%)]       Loss: 0.002900564344600         Local Rank: 0
Train Epoch: 2 [6400/21954 (29%)]       Loss: 0.005925509147346         Local Rank: 0
Train Epoch: 2 [9600/21954 (44%)]       Loss: 0.004374170210212         Local Rank: 0
Train Epoch: 2 [12800/21954 (58%)]      Loss: 0.004807741846889         Local Rank: 0
Train Epoch: 2 [16000/21954 (73%)]      Loss: 0.004936175886542         Local Rank: 0
Train Epoch: 2 [19200/21954 (87%)]      Loss: 0.005042205099016         Local Rank: 0
Train Epoch: 3 [0/21954 (0%)]   Loss: 0.005912763066590         Local Rank: 0
Train Epoch: 3 [3200/21954 (15%)]       Loss: 0.007230440154672         Local Rank: 0
Train Epoch: 3 [6400/21954 (29%)]       Loss: 0.005241629201919         Local Rank: 0
Train Epoch: 3 [9600/21954 (44%)]       Loss: 0.003206571796909         Local Rank: 0
Train Epoch: 3 [12800/21954 (58%)]      Loss: 0.005741830915213         Local Rank: 0
Train Epoch: 3 [16000/21954 (73%)]      Loss: 0.004557413049042         Local Rank: 0
Train Epoch: 3 [19200/21954 (87%)]      Loss: 0.002986140083522         Local Rank: 0
Train Epoch: 4 [0/21954 (0%)]   Loss: 0.003827466396615         Local Rank: 0
Train Epoch: 4 [3200/21954 (15%)]       Loss: 0.003044116543606         Local Rank: 0
Train Epoch: 4 [6400/21954 (29%)]       Loss: 0.004852563608438         Local Rank: 0
Train Epoch: 4 [9600/21954 (44%)]       Loss: 0.005033444147557         Local Rank: 0
Train Epoch: 4 [12800/21954 (58%)]      Loss: 0.004395484458655         Local Rank: 0
Train Epoch: 4 [16000/21954 (73%)]      Loss: 0.003537489799783         Local Rank: 0
Train Epoch: 4 [19200/21954 (87%)]      Loss: 0.004236795473844         Local Rank: 0
Train Epoch: 5 [0/21954 (0%)]   Loss: 0.004057863261551         Local Rank: 0
Train Epoch: 5 [3200/21954 (15%)]       Loss: 0.006070987787098         Local Rank: 0
Train Epoch: 5 [6400/21954 (29%)]       Loss: 0.003305417951196         Local Rank: 0
Train Epoch: 5 [9600/21954 (44%)]       Loss: 0.006066087633371         Local Rank: 0
Train Epoch: 5 [12800/21954 (58%)]      Loss: 0.003837038995698         Local Rank: 0
Train Epoch: 5 [16000/21954 (73%)]      Loss: 0.005226328969002         Local Rank: 0
Train Epoch: 5 [19200/21954 (87%)]      Loss: 0.004632766358554         Local Rank: 0
Train Epoch: 6 [0/21954 (0%)]   Loss: 0.004065050743520         Local Rank: 0
Train Epoch: 6 [3200/21954 (15%)]       Loss: 0.005211489275098         Local Rank: 0
Train Epoch: 6 [6400/21954 (29%)]       Loss: 0.006589157972485         Local Rank: 0
Train Epoch: 6 [9600/21954 (44%)]       Loss: 0.004496794659644         Local Rank: 0
Train Epoch: 6 [12800/21954 (58%)]      Loss: 0.004006389062852         Local Rank: 0
Train Epoch: 6 [16000/21954 (73%)]      Loss: 0.005213705822825         Local Rank: 0
Train Epoch: 6 [19200/21954 (87%)]      Loss: 0.002471640473232         Local Rank: 0
Train Epoch: 7 [0/21954 (0%)]   Loss: 0.004492865875363         Local Rank: 0
Train Epoch: 7 [3200/21954 (15%)]       Loss: 0.004473573062569         Local Rank: 0
Train Epoch: 7 [6400/21954 (29%)]       Loss: 0.004640904255211         Local Rank: 0
Train Epoch: 7 [9600/21954 (44%)]       Loss: 0.003035692032427         Local Rank: 0
Train Epoch: 7 [12800/21954 (58%)]      Loss: 0.003556061536074         Local Rank: 0
Train Epoch: 7 [16000/21954 (73%)]      Loss: 0.005741178989410         Local Rank: 0
Train Epoch: 7 [19200/21954 (87%)]      Loss: 0.004475364461541         Local Rank: 0
Train Epoch: 8 [0/21954 (0%)]   Loss: 0.005986265838146         Local Rank: 0
Train Epoch: 8 [3200/21954 (15%)]       Loss: 0.003541904035956         Local Rank: 0
Train Epoch: 8 [6400/21954 (29%)]       Loss: 0.004665576852858         Local Rank: 0
Train Epoch: 8 [9600/21954 (44%)]       Loss: 0.004690160043538         Local Rank: 0
Train Epoch: 8 [12800/21954 (58%)]      Loss: 0.004745676647872         Local Rank: 0
Train Epoch: 8 [16000/21954 (73%)]      Loss: 0.004688395652920         Local Rank: 0
Train Epoch: 8 [19200/21954 (87%)]      Loss: 0.003886136692017         Local Rank: 0
Train Epoch: 9 [0/21954 (0%)]   Loss: 0.006378377787769         Local Rank: 0
Train Epoch: 9 [3200/21954 (15%)]       Loss: 0.003509570378810         Local Rank: 0
Train Epoch: 9 [6400/21954 (29%)]       Loss: 0.005827599205077         Local Rank: 0
Train Epoch: 9 [9600/21954 (44%)]       Loss: 0.004743957892060         Local Rank: 0
Train Epoch: 9 [12800/21954 (58%)]      Loss: 0.004661967977881         Local Rank: 0
Train Epoch: 9 [16000/21954 (73%)]      Loss: 0.002835759660229         Local Rank: 0
Train Epoch: 9 [19200/21954 (87%)]      Loss: 0.004931921139359         Local Rank: 0
Train Epoch: 10 [0/21954 (0%)]  Loss: 0.004626110196114         Local Rank: 0
Train Epoch: 10 [3200/21954 (15%)]      Loss: 0.003018319839612         Local Rank: 0
Train Epoch: 10 [6400/21954 (29%)]      Loss: 0.003086642362177         Local Rank: 0
Train Epoch: 10 [9600/21954 (44%)]      Loss: 0.002127424115315         Local Rank: 0
Train Epoch: 10 [12800/21954 (58%)]     Loss: 0.004808292724192         Local Rank: 0
Train Epoch: 10 [16000/21954 (73%)]     Loss: 0.002561214379966         Local Rank: 0
Train Epoch: 10 [19200/21954 (87%)]     Loss: 0.002986410865560         Local Rank: 0
Train Epoch: 11 [0/21954 (0%)]  Loss: 0.003162091132253         Local Rank: 0
Train Epoch: 11 [3200/21954 (15%)]      Loss: 0.004577652085572         Local Rank: 0
Train Epoch: 11 [6400/21954 (29%)]      Loss: 0.003777351928875         Local Rank: 0
Train Epoch: 11 [9600/21954 (44%)]      Loss: 0.003447749186307         Local Rank: 0
Train Epoch: 11 [12800/21954 (58%)]     Loss: 0.002563769463450         Local Rank: 0
Train Epoch: 11 [16000/21954 (73%)]     Loss: 0.003418329870328         Local Rank: 0
Train Epoch: 11 [19200/21954 (87%)]     Loss: 0.005205884575844         Local Rank: 0
Train Epoch: 12 [0/21954 (0%)]  Loss: 0.004046686459333         Local Rank: 0
Train Epoch: 12 [3200/21954 (15%)]      Loss: 0.002304858528078         Local Rank: 0
Train Epoch: 12 [6400/21954 (29%)]      Loss: 0.002095520962030         Local Rank: 0
Train Epoch: 12 [9600/21954 (44%)]      Loss: 0.003379971953109         Local Rank: 0
Train Epoch: 12 [12800/21954 (58%)]     Loss: 0.004246129654348         Local Rank: 0
Train Epoch: 12 [16000/21954 (73%)]     Loss: 0.003332189284265         Local Rank: 0
Train Epoch: 12 [19200/21954 (87%)]     Loss: 0.004258678294718         Local Rank: 0
Train Epoch: 13 [0/21954 (0%)]  Loss: 0.001947797602043         Local Rank: 0
Train Epoch: 13 [3200/21954 (15%)]      Loss: 0.002848205156624         Local Rank: 0
Train Epoch: 13 [6400/21954 (29%)]      Loss: 0.003474295837805         Local Rank: 0
Train Epoch: 13 [9600/21954 (44%)]      Loss: 0.003261760808527         Local Rank: 0
Train Epoch: 13 [12800/21954 (58%)]     Loss: 0.003850978566334         Local Rank: 0
Train Epoch: 13 [16000/21954 (73%)]     Loss: 0.002882187021896         Local Rank: 0
Train Epoch: 13 [19200/21954 (87%)]     Loss: 0.004315676167607         Local Rank: 0
Train Epoch: 14 [0/21954 (0%)]  Loss: 0.003254068084061         Local Rank: 0
Train Epoch: 14 [3200/21954 (15%)]      Loss: 0.003293361514807         Local Rank: 0
Train Epoch: 14 [6400/21954 (29%)]      Loss: 0.003517171833664         Local Rank: 0
Train Epoch: 14 [9600/21954 (44%)]      Loss: 0.003427451942116         Local Rank: 0
Train Epoch: 14 [12800/21954 (58%)]     Loss: 0.004974686540663         Local Rank: 0
Train Epoch: 14 [16000/21954 (73%)]     Loss: 0.003587142331526         Local Rank: 0
Train Epoch: 14 [19200/21954 (87%)]     Loss: 0.002705564722419         Local Rank: 0
Train Epoch: 15 [0/21954 (0%)]  Loss: 0.003272420726717         Local Rank: 0
Train Epoch: 15 [3200/21954 (15%)]      Loss: 0.002344383625314         Local Rank: 0
Train Epoch: 15 [6400/21954 (29%)]      Loss: 0.003033963032067         Local Rank: 0
Train Epoch: 15 [9600/21954 (44%)]      Loss: 0.002688614651561         Local Rank: 0
Train Epoch: 15 [12800/21954 (58%)]     Loss: 0.003859403077513         Local Rank: 0
Train Epoch: 15 [16000/21954 (73%)]     Loss: 0.003240988822654         Local Rank: 0
Train Epoch: 15 [19200/21954 (87%)]     Loss: 0.005000061821193         Local Rank: 0
Train Epoch: 16 [0/21954 (0%)]  Loss: 0.004636402241886         Local Rank: 0
Train Epoch: 16 [3200/21954 (15%)]      Loss: 0.001435754471458         Local Rank: 0
Train Epoch: 16 [6400/21954 (29%)]      Loss: 0.002325414214283         Local Rank: 0
Train Epoch: 16 [9600/21954 (44%)]      Loss: 0.002821840345860         Local Rank: 0
Train Epoch: 16 [12800/21954 (58%)]     Loss: 0.002409463282675         Local Rank: 0
Train Epoch: 16 [16000/21954 (73%)]     Loss: 0.003217164892703         Local Rank: 0
Train Epoch: 16 [19200/21954 (87%)]     Loss: 0.002470192266628         Local Rank: 0
Train Epoch: 17 [0/21954 (0%)]  Loss: 0.003651474835351         Local Rank: 0
Train Epoch: 17 [3200/21954 (15%)]      Loss: 0.002779579255730         Local Rank: 0
Train Epoch: 17 [6400/21954 (29%)]      Loss: 0.004767744801939         Local Rank: 0
Train Epoch: 17 [9600/21954 (44%)]      Loss: 0.003743056906387         Local Rank: 0
Train Epoch: 17 [12800/21954 (58%)]     Loss: 0.001882364507765         Local Rank: 0
Train Epoch: 17 [16000/21954 (73%)]     Loss: 0.002958428580314         Local Rank: 0
Train Epoch: 17 [19200/21954 (87%)]     Loss: 0.003548386972398         Local Rank: 0
Train Epoch: 18 [0/21954 (0%)]  Loss: 0.003319771261886         Local Rank: 0
Train Epoch: 18 [3200/21954 (15%)]      Loss: 0.002014442812651         Local Rank: 0
Train Epoch: 18 [6400/21954 (29%)]      Loss: 0.002859186846763         Local Rank: 0
Train Epoch: 18 [9600/21954 (44%)]      Loss: 0.004046123009175         Local Rank: 0
Train Epoch: 18 [12800/21954 (58%)]     Loss: 0.002578746527433         Local Rank: 0
Train Epoch: 18 [16000/21954 (73%)]     Loss: 0.002782811876386         Local Rank: 0
Train Epoch: 18 [19200/21954 (87%)]     Loss: 0.004630780313164         Local Rank: 0
Train Epoch: 19 [0/21954 (0%)]  Loss: 0.003868054365739         Local Rank: 0
Train Epoch: 19 [3200/21954 (15%)]      Loss: 0.004026650451124         Local Rank: 0
Train Epoch: 19 [6400/21954 (29%)]      Loss: 0.005982798058540         Local Rank: 0
Train Epoch: 19 [9600/21954 (44%)]      Loss: 0.003184202127159         Local Rank: 0
Train Epoch: 19 [12800/21954 (58%)]     Loss: 0.002951164497063         Local Rank: 0
Train Epoch: 19 [16000/21954 (73%)]     Loss: 0.003345077391714         Local Rank: 0
Train Epoch: 19 [19200/21954 (87%)]     Loss: 0.002535319654271         Local Rank: 0
Train Epoch: 20 [0/21954 (0%)]  Loss: 0.002576967701316         Local Rank: 0
Train Epoch: 20 [3200/21954 (15%)]      Loss: 0.002919774502516         Local Rank: 0
Train Epoch: 20 [6400/21954 (29%)]      Loss: 0.003344805911183         Local Rank: 0
Train Epoch: 20 [9600/21954 (44%)]      Loss: 0.003253316041082         Local Rank: 0
Train Epoch: 20 [12800/21954 (58%)]     Loss: 0.003641423536465         Local Rank: 0
Train Epoch: 20 [16000/21954 (73%)]     Loss: 0.002654241397977         Local Rank: 0
Train Epoch: 20 [19200/21954 (87%)]     Loss: 0.002449749270454         Local Rank: 0
Train Epoch: 21 [0/21954 (0%)]  Loss: 0.003434887155890         Local Rank: 0
Train Epoch: 21 [3200/21954 (15%)]      Loss: 0.004279752727598         Local Rank: 0
Train Epoch: 21 [6400/21954 (29%)]      Loss: 0.003950941376388         Local Rank: 0
Train Epoch: 21 [9600/21954 (44%)]      Loss: 0.002134890528396         Local Rank: 0
Train Epoch: 21 [12800/21954 (58%)]     Loss: 0.003095703665167         Local Rank: 0
Train Epoch: 21 [16000/21954 (73%)]     Loss: 0.002471466083080         Local Rank: 0
Train Epoch: 21 [19200/21954 (87%)]     Loss: 0.001527963322587         Local Rank: 0
Train Epoch: 22 [0/21954 (0%)]  Loss: 0.003000585362315         Local Rank: 0
Train Epoch: 22 [3200/21954 (15%)]      Loss: 0.004471320658922         Local Rank: 0
Train Epoch: 22 [6400/21954 (29%)]      Loss: 0.003453930141404         Local Rank: 0
Train Epoch: 22 [9600/21954 (44%)]      Loss: 0.001872317749076         Local Rank: 0
Train Epoch: 22 [12800/21954 (58%)]     Loss: 0.003840844612569         Local Rank: 0
Train Epoch: 22 [16000/21954 (73%)]     Loss: 0.002468876773492         Local Rank: 0
Train Epoch: 22 [19200/21954 (87%)]     Loss: 0.004340017680079         Local Rank: 0
Train Epoch: 23 [0/21954 (0%)]  Loss: 0.002840435598046         Local Rank: 0
Train Epoch: 23 [3200/21954 (15%)]      Loss: 0.003147333860397         Local Rank: 0
Train Epoch: 23 [6400/21954 (29%)]      Loss: 0.003390608355403         Local Rank: 0
Train Epoch: 23 [9600/21954 (44%)]      Loss: 0.003638479625806         Local Rank: 0

Is this the training behavior I should expect? What does this Local Rank mean?

Thanks,

Joan

mintar commented 5 months ago

"Local Rank" is only relevant if you're training on multiple GPUs in parallel.

The loss doesn't look too good, looks like nothing much is happening.

What do the belief maps look like?

joansaurina commented 5 months ago

Hey @mintar,

Ground truth look like this: gt

And guess:

guess

Are they weird?

I got there from here in train.py:

image

Thanks,

Joan

mintar commented 5 months ago

Are they weird?

Yes. The images are normalized, so "all grey" means "all black" (there are no bright peaks in the image). There should be bright spots similar to the ground truth. In other words, training is still not working.

joansaurina commented 5 months ago

Hey @mintar. I have checked my data again using debug.py: https://github.com/NVlabs/Deep_Object_Pose/blob/master/common/debug.py

My data looks like this:

0_000000

0_000001

Are the points in the order they should be? Should the objects be bigger? Does the model work better with less objects?

Also for mustard objects I am assuming that the measurements in https://github.com/NVlabs/Deep_Object_Pose/blob/master/config/config_pose.yaml are right. Should they change depending on the scale I give to the object on the syntethic data or it's only important for real size?

"mustard": [9.6024150848388672,19.130100250244141,5.824894905090332]

Thanks,

Joan

mintar commented 5 months ago

There are multiple steps:

  1. Training: ground truth belief maps → weights
  2. Network inference: weights → estimated belief maps
  3. solvePnP: estimated belief maps + object dimensions + camera intrinsics → object pose

In your case, we know that step 2 already produces garbage belief maps, so we don't need to worry about step 3 for now. The object dimensions, order of points and camera intrinsics are not necessary in step 2 yet. The only input so far are the ground truth belief maps, and they look fine. I would guess you still have not trained on the correct object. Did you retrain with --object YOLO?

joansaurina commented 5 months ago

Hey. Yes I have done it. Now the loss does not get down :-( :

Train Epoch: 1 [3200/21954 (15%)]       Loss: 0.005810703150928         Local Rank: 0
Train Epoch: 1 [6400/21954 (29%)]       Loss: 0.005156743340194         Local Rank: 0
Train Epoch: 1 [9600/21954 (44%)]       Loss: 0.004541679285467         Local Rank: 0
Train Epoch: 1 [12800/21954 (58%)]      Loss: 0.004682376980782         Local Rank: 0
Train Epoch: 1 [16000/21954 (73%)]      Loss: 0.004043444525450         Local Rank: 0
Train Epoch: 1 [19200/21954 (87%)]      Loss: 0.005571687594056         Local Rank: 0
Train Epoch: 2 [0/21954 (0%)]   Loss: 0.005374191328883         Local Rank: 0
Train Epoch: 2 [3200/21954 (15%)]       Loss: 0.004383022896945         Local Rank: 0
Train Epoch: 2 [6400/21954 (29%)]       Loss: 0.003983714617789         Local Rank: 0
Train Epoch: 2 [9600/21954 (44%)]       Loss: 0.004592544864863         Local Rank: 0
Train Epoch: 2 [12800/21954 (58%)]      Loss: 0.005417421460152         Local Rank: 0
Train Epoch: 2 [16000/21954 (73%)]      Loss: 0.002559388289228         Local Rank: 0
Train Epoch: 2 [19200/21954 (87%)]      Loss: 0.006360759027302         Local Rank: 0
Train Epoch: 3 [0/21954 (0%)]   Loss: 0.003786348039284         Local Rank: 0
Train Epoch: 3 [3200/21954 (15%)]       Loss: 0.003547714324668         Local Rank: 0
Train Epoch: 3 [6400/21954 (29%)]       Loss: 0.004431971348822         Local Rank: 0
Train Epoch: 3 [9600/21954 (44%)]       Loss: 0.004382948391140         Local Rank: 0
Train Epoch: 3 [12800/21954 (58%)]      Loss: 0.003552622860298         Local Rank: 0
Train Epoch: 3 [16000/21954 (73%)]      Loss: 0.004630828741938         Local Rank: 0
Train Epoch: 3 [19200/21954 (87%)]      Loss: 0.003929464612156         Local Rank: 0
Train Epoch: 4 [0/21954 (0%)]   Loss: 0.005270931404084         Local Rank: 0
Train Epoch: 4 [3200/21954 (15%)]       Loss: 0.004158607684076         Local Rank: 0
Train Epoch: 4 [6400/21954 (29%)]       Loss: 0.003922347910702         Local Rank: 0
Train Epoch: 4 [9600/21954 (44%)]       Loss: 0.004678339231759         Local Rank: 0
Train Epoch: 4 [12800/21954 (58%)]      Loss: 0.005026136524975         Local Rank: 0
Train Epoch: 4 [16000/21954 (73%)]      Loss: 0.006477310787886         Local Rank: 0
Train Epoch: 4 [19200/21954 (87%)]      Loss: 0.006686189211905         Local Rank: 0
Train Epoch: 5 [0/21954 (0%)]   Loss: 0.005249368026853         Local Rank: 0
Train Epoch: 5 [3200/21954 (15%)]       Loss: 0.002798494417220         Local Rank: 0
Train Epoch: 5 [6400/21954 (29%)]       Loss: 0.005916357040405         Local Rank: 0
Train Epoch: 5 [9600/21954 (44%)]       Loss: 0.004298934713006         Local Rank: 0
Train Epoch: 5 [12800/21954 (58%)]      Loss: 0.007957908324897         Local Rank: 0
Train Epoch: 5 [16000/21954 (73%)]      Loss: 0.003513279138133         Local Rank: 0
Train Epoch: 5 [19200/21954 (87%)]      Loss: 0.003558254102245         Local Rank: 0
Train Epoch: 6 [0/21954 (0%)]   Loss: 0.003569872118533         Local Rank: 0
Train Epoch: 6 [3200/21954 (15%)]       Loss: 0.006309617776424         Local Rank: 0
Train Epoch: 6 [6400/21954 (29%)]       Loss: 0.007266306318343         Local Rank: 0
Train Epoch: 6 [9600/21954 (44%)]       Loss: 0.004512981977314         Local Rank: 0
Train Epoch: 6 [12800/21954 (58%)]      Loss: 0.004497035872191         Local Rank: 0
Train Epoch: 6 [16000/21954 (73%)]      Loss: 0.004380561411381         Local Rank: 0
Train Epoch: 6 [19200/21954 (87%)]      Loss: 0.003936089575291         Local Rank: 0
Train Epoch: 7 [0/21954 (0%)]   Loss: 0.003840330056846         Local Rank: 0
Train Epoch: 7 [3200/21954 (15%)]       Loss: 0.003853686619550         Local Rank: 0
Train Epoch: 7 [6400/21954 (29%)]       Loss: 0.004030112177134         Local Rank: 0
Train Epoch: 7 [9600/21954 (44%)]       Loss: 0.004573566839099         Local Rank: 0
Train Epoch: 7 [12800/21954 (58%)]      Loss: 0.005053386092186         Local Rank: 0
Train Epoch: 7 [16000/21954 (73%)]      Loss: 0.003217826131731         Local Rank: 0
Train Epoch: 7 [19200/21954 (87%)]      Loss: 0.005438614636660         Local Rank: 0
Train Epoch: 8 [0/21954 (0%)]   Loss: 0.003710283432156         Local Rank: 0
Train Epoch: 8 [3200/21954 (15%)]       Loss: 0.003625022713095         Local Rank: 0
Train Epoch: 8 [6400/21954 (29%)]       Loss: 0.004089147783816         Local Rank: 0
Train Epoch: 8 [9600/21954 (44%)]       Loss: 0.003993968479335         Local Rank: 0
Train Epoch: 8 [12800/21954 (58%)]      Loss: 0.004369455389678         Local Rank: 0
Train Epoch: 8 [16000/21954 (73%)]      Loss: 0.004364163614810         Local Rank: 0
Train Epoch: 8 [19200/21954 (87%)]      Loss: 0.004877035971731         Local Rank: 0
Train Epoch: 9 [0/21954 (0%)]   Loss: 0.005411149002612         Local Rank: 0
Train Epoch: 9 [3200/21954 (15%)]       Loss: 0.004677983932197         Local Rank: 0
Train Epoch: 9 [6400/21954 (29%)]       Loss: 0.003850261913612         Local Rank: 0
Train Epoch: 9 [9600/21954 (44%)]       Loss: 0.003391833975911         Local Rank: 0
Train Epoch: 9 [12800/21954 (58%)]      Loss: 0.006555408239365         Local Rank: 0
Train Epoch: 9 [16000/21954 (73%)]      Loss: 0.003267457475886         Local Rank: 0
Train Epoch: 9 [19200/21954 (87%)]      Loss: 0.002305631292984         Local Rank: 0
Train Epoch: 10 [0/21954 (0%)]  Loss: 0.004892299883068         Local Rank: 0
Train Epoch: 10 [3200/21954 (15%)]      Loss: 0.003546775085852         Local Rank: 0
Train Epoch: 10 [6400/21954 (29%)]      Loss: 0.004254724830389         Local Rank: 0
Train Epoch: 10 [9600/21954 (44%)]      Loss: 0.003957318142056         Local Rank: 0
Train Epoch: 10 [12800/21954 (58%)]     Loss: 0.004525344353169         Local Rank: 0
Train Epoch: 10 [16000/21954 (73%)]     Loss: 0.002752105239779         Local Rank: 0
Train Epoch: 10 [19200/21954 (87%)]     Loss: 0.005274944007397         Local Rank: 0
Train Epoch: 11 [0/21954 (0%)]  Loss: 0.003887929720804         Local Rank: 0
Train Epoch: 11 [3200/21954 (15%)]      Loss: 0.003796992823482         Local Rank: 0
Train Epoch: 11 [6400/21954 (29%)]      Loss: 0.004625726956874         Local Rank: 0
Train Epoch: 11 [9600/21954 (44%)]      Loss: 0.005269356071949         Local Rank: 0
Train Epoch: 11 [12800/21954 (58%)]     Loss: 0.004049479961395         Local Rank: 0
Train Epoch: 11 [16000/21954 (73%)]     Loss: 0.005351062864065         Local Rank: 0
Train Epoch: 11 [19200/21954 (87%)]     Loss: 0.003871265333146         Local Rank: 0
Train Epoch: 12 [0/21954 (0%)]  Loss: 0.003826813073829         Local Rank: 0
Train Epoch: 12 [3200/21954 (15%)]      Loss: 0.003849777160212         Local Rank: 0
Train Epoch: 12 [6400/21954 (29%)]      Loss: 0.004157819319516         Local Rank: 0
Train Epoch: 12 [9600/21954 (44%)]      Loss: 0.003834706963971         Local Rank: 0
Train Epoch: 12 [12800/21954 (58%)]     Loss: 0.006087101064622         Local Rank: 0
Train Epoch: 12 [16000/21954 (73%)]     Loss: 0.004985778126866         Local Rank: 0
Train Epoch: 12 [19200/21954 (87%)]     Loss: 0.004507347010076         Local Rank: 0
Train Epoch: 13 [0/21954 (0%)]  Loss: 0.004202453419566         Local Rank: 0
Train Epoch: 13 [3200/21954 (15%)]      Loss: 0.003596150781959         Local Rank: 0
Train Epoch: 13 [6400/21954 (29%)]      Loss: 0.003398499684408         Local Rank: 0
Train Epoch: 13 [9600/21954 (44%)]      Loss: 0.005491898860782         Local Rank: 0
Train Epoch: 13 [12800/21954 (58%)]     Loss: 0.004952728282660         Local Rank: 0
Train Epoch: 13 [16000/21954 (73%)]     Loss: 0.005556703079492         Local Rank: 0
Train Epoch: 13 [19200/21954 (87%)]     Loss: 0.003459394443780         Local Rank: 0
Train Epoch: 14 [0/21954 (0%)]  Loss: 0.003263936145231         Local Rank: 0
Train Epoch: 14 [3200/21954 (15%)]      Loss: 0.002903734799474         Local Rank: 0
Train Epoch: 14 [6400/21954 (29%)]      Loss: 0.003904050681740         Local Rank: 0
Train Epoch: 14 [9600/21954 (44%)]      Loss: 0.004332394339144         Local Rank: 0
Train Epoch: 14 [12800/21954 (58%)]     Loss: 0.004053364507854         Local Rank: 0
Train Epoch: 14 [16000/21954 (73%)]     Loss: 0.002627491019666         Local Rank: 0
Train Epoch: 14 [19200/21954 (87%)]     Loss: 0.004573819227517         Local Rank: 0
Train Epoch: 15 [0/21954 (0%)]  Loss: 0.005188755691051         Local Rank: 0
Train Epoch: 15 [3200/21954 (15%)]      Loss: 0.002678437624127         Local Rank: 0
Train Epoch: 15 [6400/21954 (29%)]      Loss: 0.004402271471918         Local Rank: 0
Train Epoch: 15 [9600/21954 (44%)]      Loss: 0.002853166777641         Local Rank: 0
Train Epoch: 15 [12800/21954 (58%)]     Loss: 0.003138819243759         Local Rank: 0
Train Epoch: 15 [16000/21954 (73%)]     Loss: 0.002094447147101         Local Rank: 0
Train Epoch: 15 [19200/21954 (87%)]     Loss: 0.002206683158875         Local Rank: 0
Train Epoch: 16 [0/21954 (0%)]  Loss: 0.004009365569800         Local Rank: 0
Train Epoch: 16 [3200/21954 (15%)]      Loss: 0.003115003230050         Local Rank: 0
Train Epoch: 16 [6400/21954 (29%)]      Loss: 0.003000474767759         Local Rank: 0
Train Epoch: 16 [9600/21954 (44%)]      Loss: 0.003342849435285         Local Rank: 0
Train Epoch: 16 [12800/21954 (58%)]     Loss: 0.002824738621712         Local Rank: 0
Train Epoch: 16 [16000/21954 (73%)]     Loss: 0.003055247711018         Local Rank: 0
Train Epoch: 16 [19200/21954 (87%)]     Loss: 0.002787153003737         Local Rank: 0

Well I prefered changing the json instead of runing on YOLO so my .json not look like:

{
    "camera_data": {
        "width": 1920,
        "height": 1080,
        "camera_look_at": {
            "at": [
                -0.0,
                1.0,
                -7.549790126404332e-08
            ],
            "eye": [
                -0.0,
                25.0,
                -0.0
            ],
            "up": [
                1.0,
                0.0,
                0.0
            ]
        },
        "intrinsics": {
            "fx": 1400.0,
            "fy": 1400.0,
            "cx": 0.0,
            "cy": 0.0
        }
    },
    "objects": [
        {
            "class": "mustard",
            "name": "mustard_000",
            "visibility": 12914,
            "projected_cuboid": [
                [
                    750.9035681927193,
                    390.83822261522187
                ],
                [
                    761.1644572794883,
                    442.94067550197883
                ],
                [
                    599.8493668936467,
                    379.6746688876557
                ],
                [
                    579.875981257254,
                    324.84771093683037
                ],
                [
                    718.7154844479179,
                    444.08253438598456
                ],
                [
                    731.1318636098214,
                    494.0153711489393
                ],
                [
                    567.2783000747321,
                    428.3917933960647
                ],
                [
                    544.835173374655,
                    375.47882240304136
                ],
                [
                    656.1971221978729,
                    410.2068242618363
                ]
            ]
        },
        {
            "class": "mustard",
            "name": "mustard_001",
            "visibility": 4799,
            "projected_cuboid": [
                [
                    1109.4795912902357,
                    720.7869478499749
                ],
                [
                    1109.0823834696737,
                    662.4888808563237
                ],
                [
                    1036.855268626012,
                    689.683947717378
                ],
                [
                    1037.8529097158666,
                    751.7025568381521
                ],
                [
                    1075.8742002166698,
                    713.6672942347798
                ],
                [
                    1075.2372250801825,
                    656.4038049131801
                ],
                [
                    1002.1887397728411,
                    682.6520519180447
                ],
                [
                    1003.4365781606759,
                    743.5015496986963
                ],
                [
                    1057.261512202467,
                    702.2268312310014
                ]
            ]
        },
        {
            "class": "mustard",
            "name": "mustard_002",
            "visibility": 17625,
            "projected_cuboid": [
                [
                    720.036291267466,
                    519.3983543748533
                ],
                [
                    742.9193011950515,
                    453.9921753368337
                ],
                [
                    724.4828476240957,
                    255.2848973143968
                ],
                [
                    704.4848871133012,
                    328.7943499485361
                ],
                [
                    801.0228093128334,
                    521.8432839307596
                ],
                [
                    829.3912792017384,
                    458.03777581896
                ],
                [
                    804.6461218956292,
                    262.9991322721118
                ],
                [
                    779.9077671976531,
                    334.6375664009463
                ],
                [
                    762.9658328890948,
                    389.30588755545114
                ]
            ]
        },
        {
            "class": "mustard",
            "name": "mustard_003",
            "visibility": 4217,
            "projected_cuboid": [
                [
                    848.1003907509416,
                    388.9164682185301
                ],
                [
                    875.4166981644412,
                    398.1605023143868
                ],
                [
                    861.3643567212698,
                    499.5454468877687
                ],
                [
                    833.5224451135997,
                    493.74937688416634
                ],
                [
                    879.6058973652712,
                    391.18040321508244
                ],
                [
                    906.1953453549099,
                    400.4587122910186
                ],
                [
                    892.0260112722663,
                    503.0537778683801
                ],
                [
                    864.9016822656603,
                    497.3076813937357
                ],
                [
                    870.2838683998851,
                    446.5585041440224
                ]
            ]
        },
        {
            "class": "mustard",
            "name": "mustard_004",
            "visibility": 4790,
            "projected_cuboid": [
                [
                    1123.4262027423658,
                    524.2055873532208
                ],
                [
                    1166.6502577851543,
                    558.6422538702473
                ],
                [
                    1106.0838701908192,
                    648.7360988053608
                ],
                [
                    1065.0182073658102,
                    613.6816375747065
                ],
                [
                    1105.6994265760309,
                    524.7275036199951
                ],
                [
                    1149.676440468896,
                    560.0290647847079
                ],
                [
                    1088.1110407512347,
                    652.2617814972203
                ],
                [
                    1046.3811895920194,
                    616.3140102268816
                ],
                [
                    1105.8948850486595,
                    587.7758404873681
                ]
            ]
        }
    ]
}

What could I check to see if I have done it right?

Here is my training command:

python -m torch.distributed.run ./Deep_Object_Pose/train/train.py --data /DOPE/data/mustard/train2 --outf /DOPE/outputs --object mustard --namefile mustard --epochs 100 --gpuid 4 --exts png

Thanks

Joan

mintar commented 5 months ago

What could I check to see if I have done it right?

You could show your belief maps again and check whether they have clear peaks.

RenanMoreiraPinto commented 5 months ago

Got the same problem the train dont look to work . I download the dataset for canned meat that NVSII provide for test and see if i get any changes.. they have a 60.000 imagens dataset to test, did you try using train or train2 version?

mintar commented 5 months ago

There have been quite a few changes to the repo recently, some of which broke stuff. Maybe it's worth checking out an older version.

joansaurina commented 5 months ago

I am using the current train version they have on this Github, not train2. @RenanMoreiraPinto

Do you know which version was 100% working well? @mintar

Thanks

Joan

mintar commented 5 months ago

Do you know which version was 100% working well?

scripts/train2/train.py in a4fe3cd4b5a739defdba4818d0a183490237ad0b is working very well.

RenanMoreiraPinto commented 5 months ago

Start testing with train2 .. i just had to change 2 things :

line 140 #parser.add_argument("--local_rank", type=int)
to parser.add_argument('--local_rank', '--local-rank', type=int, default=0) and add on line 155 local_rank = opt.local_rank

and line 259 transform.Scale to transform.Resize ...

We shall see how it goes... Fingers crossed

joansaurina commented 5 months ago

Same here, the loss does not seem to be getting down...

TontonTremblay commented 5 months ago

@nv-jeff did you test your changes to make sure the training script worked correctly? Following this thread I would think they broke a few things.

I would recommend using the dope repo before @nv-jeff changes, using nvisii to generate data and training with these scripts. Sorry about this to @joansaurina and @RenanMoreiraPinto

joansaurina commented 5 months ago

Could you point us to the correct version? @TontonTremblay

And yeah i'll try to change blenderoc for nvisii

TontonTremblay commented 5 months ago

https://github.com/NVlabs/Deep_Object_Pose/tree/128631a23c827d2091cfd103c03c8c3a93fc6134

joansaurina commented 5 months ago

It seems I cannot use nvisii due to my nvdia drivers...

TontonTremblay commented 5 months ago

so sorry! what error are you getting?

RenanMoreiraPinto commented 5 months ago

To use the nvsii I had to downgrade the nvidia drivers to 450 and use the ubuntu 20.04, but didn't go well at middle of image creation the pc shutdown.. I using only 2 gpu for train so take longer to get results.. testing the train2 atm. I using the shiny meat dataset..

Em sáb., 22 de jun. de 2024, 14:49, Jonathan Tremblay < @.***> escreveu:

so sorry! what error are you getting?

— Reply to this email directly, view it on GitHub https://github.com/NVlabs/Deep_Object_Pose/issues/367#issuecomment-2184127670, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIEWK2GJOUQVXDYHYEGU4DLZIW2LHAVCNFSM6AAAAABJGFMR6GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBUGEZDONRXGA . You are receiving this because you were mentioned.Message ID: @.***>

RenanMoreiraPinto commented 5 months ago

i got some results using the train2 with 20.000 photos and 60 epochs ... 000101 000001 Screenshot from 2024-06-23 20-52-56 will increase the dataset and raise the training time.

joansaurina commented 5 months ago

Hey @RenanMoreiraPinto

Are these results with data generated with nvsii and this version of the code https://github.com/NVlabs/Deep_Object_Pose/tree/128631a23c827d2091cfd103c03c8c3a93fc6134 ?

Could you describe how did you generate the data with nvsii? The drivers and version you mentioned?

And which objects are you training on? If you are interested we could share the dataset we create (I want 20 YCB objects)

Thanks;

Joan

joansaurina commented 5 months ago

Hey @TontonTremblay I am having same problem you discussed here:

https://github.com/owl-project/NVISII/issues/167

Do you think with blendeproc with more images the training could go well?

Also the json on the meat can dataset you have available (done with nvsii) have more things than the json blenderroc generates:

Meat can json:

{
    "camera_data": {
        "camera_look_at": {
            "at": [
                0,
                0,
                0
            ],
            "eye": [
                0,
                0,
                1
            ],
            "up": [
                1,
                0,
                0
            ]
        },
        "camera_view_matrix": [
            [
                5.960464477539063e-08,
                0.9999999403953552,
                0.0,
                0.0
            ],
            [
                -0.9999999403953552,
                5.960464477539063e-08,
                0.0,
                0.0
            ],
            [
                0.0,
                0.0,
                1.0,
                0.0
            ],
            [
                0.0,
                0.0,
                -1.0,
                1.0
            ]
        ],
        "height": 512,
        "intrinsics": {
            "cx": 256.0,
            "cy": 256.0,
            "fx": 618.038818359375,
            "fy": 618.038818359375
        },
        "location_world": [
            0.0,
            0.0,
            1.0
        ],
        "quaternion_world_xyzw": [
            0.0,
            -0.0,
            -0.7071068286895752,
            0.7071068286895752
        ],
        "width": 512
    },
    "objects": [
        {
            "bounding_box_minx_maxx_miny_maxy": [
                151,
                202,
                336,
                240
            ],
            "class": "010",
            "local_cuboid": [
                [
                    5.082290172576904,
                    -4.177199840545654,
                    2.880038022994995
                ],
                [
                    5.082290172576904,
                    4.17710018157959,
                    2.880038022994995
                ],
                [
                    5.082290172576904,
                    4.17710018157959,
                    -2.8800549507141113
                ],
                [
                    5.082290172576904,
                    -4.177199840545654,
                    -2.8800549507141113
                ],
                [
                    -5.082388877868652,
                    -4.177199840545654,
                    2.880038022994995
                ],
                [
                    -5.082388877868652,
                    4.17710018157959,
                    2.880038022994995
                ],
                [
                    -5.082388877868652,
                    4.17710018157959,
                    -2.8800549507141113
                ],
                [
                    -5.082388877868652,
                    -4.177199840545654,
                    -2.8800549507141113
                ],
                [
                    -4.9591064453125e-05,
                    -5.0067901611328125e-05,
                    -8.58306884765625e-06
                ],
                [
                    -4.9591064453125e-05,
                    -5.0067901611328125e-05,
                    -8.58306884765625e-06
                ]
            ],
            "location": [
                -0.0988817885518074,
                -0.03944795951247215,
                -0.776299238204956
            ],
            "location_world": [
                -0.03944797068834305,
                0.098881796002388,
                0.22370076179504395
            ],
            "name": "010_potted_meat_can_0",
            "projected_cuboid": [
                [
                    193.97714233398438,
                    263.0047912597656
                ],
                [
                    201.8032989501953,
                    236.60040283203125
                ],
                [
                    154.7750701904297,
                    236.51812744140625
                ],
                [
                    151.06594848632812,
                    263.3470153808594
                ],
                [
                    198.42910766601562,
                    335.243408203125
                ],
                [
                    207.14007568359375,
                    314.8186340332031
                ],
                [
                    157.59410095214844,
                    316.1653137207031
                ],
                [
                    153.43124389648438,
                    336.80145263671875
                ],
                [
                    177.27670288085938,
                    287.4064025878906
                ]
            ],
            "provenance": "visii",
            "quaternion_xyzw": [
                0.42702290415763855,
                0.6565284729003906,
                0.34189745783805847,
                0.5193535685539246
            ],
            "quaternion_xyzw_world": [
                0.766186535358429,
                0.1622849702835083,
                -0.12548041343688965,
                0.6089964509010315
            ],
            "segmentation_id": 516,
            "visibility": 1
        },
     ...
}

And the ones generated with blendeproc:

{
    "camera_data": {
        "width": 1920,
        "height": 1080,
        "camera_look_at": {
            "at": [
                -0.0,
                1.0,
                -7.549790126404332e-08
            ],
            "eye": [
                -0.0,
                25.0,
                -0.0
            ],
            "up": [
                1.0,
                0.0,
                0.0
            ]
        },
        "intrinsics": {
            "fx": 1400.0,
            "fy": 1400.0,
            "cx": 0.0,
            "cy": 0.0
        }
    },
    "objects": [
        {
            "class": "mustard",
            "name": "mustard_000",
            "visibility": 21097,
            "projected_cuboid": [
                [
                    904.7635453410297,
                    770.3589510152592
                ],
                [
                    883.8644226635671,
                    889.134557533943
                ],
                [
                    943.268177674897,
                    886.1306841357716
                ],
                [
                    963.2902662966745,
                    771.9897929002564
                ],
                [
                    1080.5672941899984,
                    832.2437187043636
                ],
                [
                    1056.9675715295327,
                    967.6840891381368
                ],
                [
                    1116.6414598612605,
                    960.8037931729342
                ],
                [
                    1139.1275588514086,
                    831.357113856285
                ],
                [
                    1006.1201942170317,
                    861.4626103134999
                ]
            ]
        },
        {
            "class": "mustard",
            "name": "mustard_001",
            "visibility": 20507,
            "projected_cuboid": [
                [
                    824.8679652073276,
                    612.1368223345802
                ],
                [
                    781.0730850601043,
                    655.3216662265288
                ],
                [
                    697.2111498154177,
                    641.1983676071956
                ],
                [
                    747.129890915294,
                    599.6675305184601
                ],
                [
                    815.1359365632503,
                    823.9862216074034
                ],
                [
                    772.8770394270964,
                    882.4040559918749
                ],
                [
                    693.5464390005867,
                    866.1487835534365
                ],
                [
                    741.3058754329253,
                    809.6437815881382
                ],
                [
                    759.7376446027137,
                    738.1167859921233
                ]
            ]
        },
        ...

Is it important to have local cuboid, location and location world?

Thanks

Joan

nv-jeff commented 5 months ago

Also the json on the meat can dataset you have available (done with nvsii) have more things than the json blenderroc generates

The older, NVisii data generation code produced a number of fields that were used for experiments and debugging. The newer blenderproc data generation code only emits the required fields with one exception: it also generates the debugging/visualization values: 'location' and 'quaternion_xyzw', which describe the position and orientation of the object in the camera coordinate system.

The data generated for each object is described at lines 278-287 of data_generation/blenderproc_data_gen/generate_training_data.py

RenanMoreiraPinto commented 5 months ago

Hey @RenanMoreiraPinto

Are these results with data generated with nvsii and this version of the code https://github.com/NVlabs/Deep_Object_Pose/tree/128631a23c827d2091cfd103c03c8c3a93fc6134 ?

Could you describe how did you generate the data with nvsii? The drivers and version you mentioned?

And which objects are you training on? If you are interested we could share the dataset we create (I want 20 YCB objects)

Thanks;

Joan

hi! I used the blenderproc version 20.000 imagens dataset and my own blender .obj of a door :p ......... i used the new train2 version i will try the old version i making imagens on nvsii but since i had to make a downgrade to the nvidia 450 i used a old computer... i will try the new version with 60.000 dataset.

RenanMoreiraPinto commented 5 months ago

Captura de tela de 2024-06-26 13-05-04 00006 00007.json 00007 00008.json 00008 00009.json 00009 00010.json 00010 i made a run file to run all to make it work i just add the obj file whit the textured version on the models path the good thing is use the interactive to see items and position OUT_FOLDER="output" args="" args+=" --scale 0.5" args+=" --nb_frames 10000" args+=" --nb_objects 2" args+=" --nb_distractors 10" args+=" --outf $OUT_FOLDER" args+=" --width 500" args+=" --height 500"

" --path_single_obj /home/renan/Deep_Object_Pose/data_generation/nvisii_data_gen/models/door/door.obj"

i comment this 2 while training args+=" --debug" args+=" --interactive"

python3 single_video_pybullet.py $args

ubuntu 20.04

joansaurina commented 4 months ago

I am finally getting decent results!

I think I had these problems:

  1. I needed more data: I had 20,000 images and now I have 75,000.

  2. The images I was using were not square: I was using 1920x1080, and this did not fit well into the network which uses 512x512. I generated my images now with size 512x512.

  3. I was using the new script which must have something wrong. The one working is: https://github.com/NVlabs/Deep_Object_Pose/tree/128631a23c827d2091cfd103c03c8c3a93fc6134

  4. My target objects were not big enough.

I'll report when training is done with inference results, but my loss does look good now:

Train Epoch: 1 [0/81599 (0%)]   Loss: 0.067368142306805
Train Epoch: 1 [3200/81599 (4%)]        Loss: 0.041182152926922
Train Epoch: 1 [6400/81599 (8%)]        Loss: 0.039299402385950
Train Epoch: 1 [9600/81599 (12%)]       Loss: 0.039307989180088
Train Epoch: 1 [12800/81599 (16%)]      Loss: 0.038886558264494
Train Epoch: 1 [16000/81599 (20%)]      Loss: 0.040250405669212
Train Epoch: 1 [19200/81599 (24%)]      Loss: 0.038537904620171
Train Epoch: 1 [22400/81599 (27%)]      Loss: 0.039164066314697
Train Epoch: 1 [25600/81599 (31%)]      Loss: 0.038494884967804
Train Epoch: 1 [28800/81599 (35%)]      Loss: 0.039913102984428
Train Epoch: 1 [32000/81599 (39%)]      Loss: 0.037832036614418
Train Epoch: 1 [35200/81599 (43%)]      Loss: 0.037465162575245
Train Epoch: 1 [38400/81599 (47%)]      Loss: 0.038491453975439
Train Epoch: 1 [41600/81599 (51%)]      Loss: 0.037618502974510
Train Epoch: 1 [44800/81599 (55%)]      Loss: 0.038155931979418
Train Epoch: 1 [48000/81599 (59%)]      Loss: 0.036317259073257
Train Epoch: 1 [51200/81599 (63%)]      Loss: 0.037838213145733
Train Epoch: 1 [54400/81599 (67%)]      Loss: 0.035968508571386
Train Epoch: 1 [57600/81599 (71%)]      Loss: 0.035015475004911
Train Epoch: 1 [60800/81599 (75%)]      Loss: 0.035495396703482
 ...
Train Epoch: 5 [54400/81599 (67%)]      Loss: 0.022631796076894
Train Epoch: 5 [57600/81599 (71%)]      Loss: 0.021600604057312
Train Epoch: 5 [60800/81599 (75%)]      Loss: 0.021810073405504
Train Epoch: 5 [64000/81599 (78%)]      Loss: 0.021704383194447
Train Epoch: 5 [67200/81599 (82%)]      Loss: 0.023373922333121
Train Epoch: 5 [70400/81599 (86%)]      Loss: 0.021433854475617
Train Epoch: 5 [73600/81599 (90%)]      Loss: 0.022643189877272
Train Epoch: 5 [76800/81599 (94%)]      Loss: 0.022783258929849

Thanks;

Joan

phsilvarepo commented 4 months ago

Hello there,

Just some quick questions @joansaurina, in this new training you performed. Are you training using train.py or train2, in the version you are using? And were you able to perform inference? There is not a lot of documentation in that version, so not sure if it is possible.

Thanks for the help

nv-jeff commented 4 months ago

I am investigating the differences in training with the older training code (in train2) and the newer code (in train). So far, it's clear that the newer code trains much faster (more than twice as quickly), which was noted when it was introduced, but the results are poorer. I will update this thread when the problem has been located and fixed.

joansaurina commented 4 months ago

Hey @phsilvarepo I used train2. Yes I am performing inference!

Feel free to ask mor questions

Joan

nv-jeff commented 4 months ago

I've done code comparisons and several experiments and I believe I have found the source of the quality discrepancies between the original training code (train2) and the newer training code (train). Ironically it was a bug in the older code that was causing the issue.

Specifically, the old dataloader -- in train2/utils_dope.py at lines 185-187 (NOTE line numbers refer to the pre-fix version) -- was loading the data in subfolders twice. So, in effect, the original DOPE code was performing two epochs of training per reported epoch. This bug would only affect users who put data in subfolders and not all in one flat directory. This bug explains why, for the same number of reported epochs, the old code generated better results, as well as explaining why the new training code appeared to be twice as fast as the older code.

I have run comparisons of the fixed old code and the new code on a single machine with a single GPU, having set the random seed to 1 and setting the number of worker threads to one in order to eliminate randomness in data loading. The loss decrease is identical to four decimal places (at the end of one epoch) and the results are equivalent. There are some small differences (less than a pixel) in cuboid point location but I believe this is the result of slight differences in math resulting from refactoring.

I will make the code change to eliminate the double-loading issue, but will not close this issue for now. If any of you (awesome! beautiful! thoughtful!) users find any continuing differences, please let us know immediately and I will work with you to determine the problem.

phsilvarepo commented 4 months ago

Hello there, Just a quick question regarding the inference, did the results improved @joansaurina? I am still struggling to achieve good results even after the @nv-jeff bug fix, as I continue to get incomplete cuboid defections in my inference.

joansaurina commented 3 months ago

No. It does not work well. @phsilvarepo