NVlabs / Deep_Object_Pose

Deep Object Pose Estimation (DOPE) – ROS inference (CoRL 2018)
Other
1.02k stars 287 forks source link

why the loss goes to inf and nan? #74

Closed qingchenkanlu closed 5 years ago

qingchenkanlu commented 5 years ago

yusheng@yusheng:~/code/Deep_Object_Pose-master/scripts$ python train.py --data /home/yusheng/fat/single/004_sugar_box_16k --object 004_sugar_box_16k --outf /home/yusheng/code --gpuids 0 1 --loginterval 1 --batchsize 32 --epochs 60 start: 17:12:18.049454 load data training data: 188 batches load models Training network pretrained on imagenet. Train Epoch: 1 [0/6000 (0%)] Loss: 0.037806294858456 Train Epoch: 1 [32/6000 (1%)] Loss: 0.037991054356098 Train Epoch: 1 [64/6000 (1%)] Loss: 0.040447328239679 Train Epoch: 1 [96/6000 (2%)] Loss: 0.084592938423157 Train Epoch: 1 [128/6000 (2%)] Loss: 0.829165458679199 Train Epoch: 1 [160/6000 (3%)] Loss: 9.005415916442871 Train Epoch: 1 [192/6000 (3%)] Loss: 90.017181396484375 Train Epoch: 1 [224/6000 (4%)] Loss: 1015.848876953125000 Train Epoch: 1 [256/6000 (4%)] Loss: 15751.037109375000000 Train Epoch: 1 [288/6000 (5%)] Loss: 354724.843750000000000 Train Epoch: 1 [320/6000 (5%)] Loss: 14932532.000000000000000 Train Epoch: 1 [352/6000 (6%)] Loss: 998436224.000000000000000 Train Epoch: 1 [384/6000 (6%)] Loss: 83260162048.000000000000000 Train Epoch: 1 [416/6000 (7%)] Loss: 7205331927040.000000000000000 Train Epoch: 1 [448/6000 (7%)] Loss: 582404947312640.000000000000000 Train Epoch: 1 [480/6000 (8%)] Loss: 41636405104869376.000000000000000 Train Epoch: 1 [512/6000 (9%)] Loss: 2673606558960582656.000000000000000 Train Epoch: 1 [544/6000 (9%)] Loss: 149008032411645116416.000000000000000 Train Epoch: 1 [576/6000 (10%)] Loss: 7348256673702516621312.000000000000000 Train Epoch: 1 [608/6000 (10%)] Loss: 314164264533932197806080.000000000000000 Train Epoch: 1 [640/6000 (11%)] Loss: 5364948436383775982616576.000000000000000 Train Epoch: 1 [672/6000 (11%)] Loss: 13268717647946179126755328.000000000000000 Train Epoch: 1 [704/6000 (12%)] Loss: 28394457492577652746223616.000000000000000 Train Epoch: 1 [736/6000 (12%)] Loss: 52918031761984019471794176.000000000000000 Train Epoch: 1 [768/6000 (13%)] Loss: 106035777198682055404158976.000000000000000 Train Epoch: 1 [800/6000 (13%)] Loss: 218677813065505417690349568.000000000000000 Train Epoch: 1 [832/6000 (14%)] Loss: 450564491011966854653542400.000000000000000 Train Epoch: 1 [864/6000 (14%)] Loss: 912350800526757883098955776.000000000000000 Train Epoch: 1 [896/6000 (15%)] Loss: 1789880493922313488832659456.000000000000000 Train Epoch: 1 [928/6000 (15%)] Loss: 3429768003024100588976603136.000000000000000 Train Epoch: 1 [960/6000 (16%)] Loss: 6433352465997991795413221376.000000000000000 Train Epoch: 1 [992/6000 (16%)] Loss: 11837126327259398519457316864.000000000000000 Train Epoch: 1 [1024/6000 (17%)] Loss: 21323692376487189698974842880.000000000000000 Train Epoch: 1 [1056/6000 (18%)] Loss: 36813930439282198319084863488.000000000000000 Train Epoch: 1 [1088/6000 (18%)] Loss: 61357325200240187820289818624.000000000000000 Train Epoch: 1 [1120/6000 (19%)] Loss: 98544587044192128811388633088.000000000000000 Train Epoch: 1 [1152/6000 (19%)] Loss: 147204474636056691886480424960.000000000000000 Train Epoch: 1 [1184/6000 (20%)] Loss: 213704367211178748821226127360.000000000000000 Train Epoch: 1 [1216/6000 (20%)] Loss: 301466790433285109244961488896.000000000000000 Train Epoch: 1 [1248/6000 (21%)] Loss: 418033136901927418820988764160.000000000000000 Train Epoch: 1 [1280/6000 (21%)] Loss: 571614241889268924118103752704.000000000000000 Train Epoch: 1 [1312/6000 (22%)] Loss: 770483937036354352772252958720.000000000000000 Train Epoch: 1 [1344/6000 (22%)] Loss: 1022732051696589899563276435456.000000000000000 Train Epoch: 1 [1376/6000 (23%)] Loss: 1344794423513750666564406870016.000000000000000 Train Epoch: 1 [1408/6000 (23%)] Loss: 1744524990190690544037769773056.000000000000000 Train Epoch: 1 [1440/6000 (24%)] Loss: 2212202325864705943303612268544.000000000000000 Train Epoch: 1 [1472/6000 (24%)] Loss: 2717779485983639967196493381632.000000000000000 Train Epoch: 1 [1504/6000 (25%)] Loss: 3234325381179664973420130467840.000000000000000 Train Epoch: 1 [1536/6000 (26%)] Loss: 3784757164868033131753487990784.000000000000000 Train Epoch: 1 [1568/6000 (26%)] Loss: 4357510554889765803168620347392.000000000000000 Train Epoch: 1 [1600/6000 (27%)] Loss: 4905725168624960731595566743552.000000000000000 Train Epoch: 1 [1632/6000 (27%)] Loss: 5421619452804213452694037200896.000000000000000 Train Epoch: 1 [1664/6000 (28%)] Loss: 5920558250131916095959959863296.000000000000000 Train Epoch: 1 [1696/6000 (28%)] Loss: 6413529185151091122304945487872.000000000000000 Train Epoch: 1 [1728/6000 (29%)] Loss: 6903422799496437110085716541440.000000000000000 Train Epoch: 1 [1760/6000 (29%)] Loss: 7399687452911152193617189142528.000000000000000 Train Epoch: 1 [1792/6000 (30%)] Loss: 7913650780325025448266360160256.000000000000000 Train Epoch: 1 [1824/6000 (30%)] Loss: 8448996983173332456443146665984.000000000000000 Train Epoch: 1 [1856/6000 (31%)] Loss: 9004077086638118863953249435648.000000000000000 Train Epoch: 1 [1888/6000 (31%)] Loss: 9591167732417571230065809686528.000000000000000 Train Epoch: 1 [1920/6000 (32%)] Loss: 10178582370316680316797191192576.000000000000000 Train Epoch: 1 [1952/6000 (32%)] Loss: 10785870539764434292531567525888.000000000000000 Train Epoch: 1 [1984/6000 (33%)] Loss: 11417378329082347749152007389184.000000000000000 Train Epoch: 1 [2016/6000 (34%)] Loss: 12066216070024436914991860285440.000000000000000 Train Epoch: 1 [2048/6000 (34%)] Loss: 12713814662001421085927639351296.000000000000000 Train Epoch: 1 [2080/6000 (35%)] Loss: 13370240830313231279097122914304.000000000000000 Train Epoch: 1 [2112/6000 (35%)] Loss: 14025273107155025804828170256384.000000000000000 Train Epoch: 1 [2144/6000 (36%)] Loss: 14657849586897478593639050379264.000000000000000 Train Epoch: 1 [2176/6000 (36%)] Loss: 15279670253622820026682569654272.000000000000000 Train Epoch: 1 [2208/6000 (37%)] Loss: 15878822352304567548071173423104.000000000000000 Train Epoch: 1 [2240/6000 (37%)] Loss: 16458485357848307632534338928640.000000000000000 Train Epoch: 1 [2272/6000 (38%)] Loss: 17018088657267182175101604855808.000000000000000 Train Epoch: 1 [2304/6000 (38%)] Loss: 17558954815407849580090099761152.000000000000000 Train Epoch: 1 [2336/6000 (39%)] Loss: 18083640710378794788204327206912.000000000000000 Train Epoch: 1 [2368/6000 (39%)] Loss: 18592585182252537909834705534976.000000000000000 Train Epoch: 1 [2400/6000 (40%)] Loss: 19086364888645035123097569591296.000000000000000 Train Epoch: 1 [2432/6000 (40%)] Loss: 19554307432420728481638613254144.000000000000000 Train Epoch: 1 [2464/6000 (41%)] Loss: 20006046743436126965351070040064.000000000000000 Train Epoch: 1 [2496/6000 (41%)] Loss: 20381527013750234640723061243904.000000000000000 Train Epoch: 1 [2528/6000 (42%)] Loss: 20761023335637102114213426364416.000000000000000 Train Epoch: 1 [2560/6000 (43%)] Loss: 21016699057227400038372035526656.000000000000000 Train Epoch: 1 [2592/6000 (43%)] Loss: 21277730320198590769774593048576.000000000000000 Train Epoch: 1 [2624/6000 (44%)] Loss: 21520199736135418484828711944192.000000000000000 Train Epoch: 1 [2656/6000 (44%)] Loss: 21800305430688488835349783511040.000000000000000 Train Epoch: 1 [2688/6000 (45%)] Loss: 22056285801585329646060418629632.000000000000000 Train Epoch: 1 [2720/6000 (45%)] Loss: 22303687634866185048147338723328.000000000000000 Train Epoch: 1 [2752/6000 (46%)] Loss: 22558045627313103026505518153728.000000000000000 Train Epoch: 1 [2784/6000 (46%)] Loss: 22816642113683589894750217437184.000000000000000 Train Epoch: 1 [2816/6000 (47%)] Loss: 23077900654708868176437619720192.000000000000000 Train Epoch: 1 [2848/6000 (47%)] Loss: 23361260777568341108697000312832.000000000000000 Train Epoch: 1 [2880/6000 (48%)] Loss: 23655034587437974456667299905536.000000000000000 Train Epoch: 1 [2912/6000 (48%)] Loss: 23955022276020426998595589242880.000000000000000 Train Epoch: 1 [2944/6000 (49%)] Loss: 24226196426618184468773931581440.000000000000000 Train Epoch: 1 [2976/6000 (49%)] Loss: 24545302068112022756471024386048.000000000000000 Train Epoch: 1 [3008/6000 (50%)] Loss: 24829186864777208437792227459072.000000000000000 Train Epoch: 1 [3040/6000 (51%)] Loss: 25140794748337796795347792560128.000000000000000 Train Epoch: 1 [3072/6000 (51%)] Loss: 25377113149704425277161006432256.000000000000000 Train Epoch: 1 [3104/6000 (52%)] Loss: 25612877863045670258812204875776.000000000000000 Train Epoch: 1 [3136/6000 (52%)] Loss: 25841916113126579443735338156032.000000000000000 Train Epoch: 1 [3168/6000 (53%)] Loss: 26079092851825134312262593413120.000000000000000 Train Epoch: 1 [3200/6000 (53%)] Loss: 26196769691106422316128492584960.000000000000000 Train Epoch: 1 [3232/6000 (54%)] Loss: 26323336970865156302945180975104.000000000000000 Train Epoch: 1 [3264/6000 (54%)] Loss: 26434857959872966615053476298752.000000000000000 Train Epoch: 1 [3296/6000 (55%)] Loss: 26566215003729013762660350558208.000000000000000 Train Epoch: 1 [3328/6000 (55%)] Loss: 26701718663146339088336467001344.000000000000000 Train Epoch: 1 [3360/6000 (56%)] Loss: 26824722029588849148346121584640.000000000000000 Train Epoch: 1 [3392/6000 (56%)] Loss: 26948813429269012374613011726336.000000000000000 Train Epoch: 1 [3424/6000 (57%)] Loss: 27066282333309326662260861435904.000000000000000 Train Epoch: 1 [3456/6000 (57%)] Loss: 27207903157373902011560991129600.000000000000000 Train Epoch: 1 [3488/6000 (58%)] Loss: 27322477893002058876964594253824.000000000000000 Train Epoch: 1 [3520/6000 (59%)] Loss: 27436387719429427696322108981248.000000000000000 Train Epoch: 1 [3552/6000 (59%)] Loss: 27570056229452578014911021449216.000000000000000 Train Epoch: 1 [3584/6000 (60%)] Loss: 27705842777511693163814019137536.000000000000000 Train Epoch: 1 [3616/6000 (60%)] Loss: 27821017140346378885288276525056.000000000000000 Train Epoch: 1 [3648/6000 (61%)] Loss: 27933863112052486830972049817600.000000000000000 Train Epoch: 1 [3680/6000 (61%)] Loss: 28052134742837025232391904428032.000000000000000 Train Epoch: 1 [3712/6000 (62%)] Loss: 28166837624602061248488026406912.000000000000000 Train Epoch: 1 [3744/6000 (62%)] Loss: 28285053644798897376965844533248.000000000000000 Train Epoch: 1 [3776/6000 (63%)] Loss: 28418990536354002143231541772288.000000000000000 Train Epoch: 1 [3808/6000 (63%)] Loss: 28532663413320726495270814089216.000000000000000 Train Epoch: 1 [3840/6000 (64%)] Loss: 28653526981062518661641238740992.000000000000000 Train Epoch: 1 [3872/6000 (64%)] Loss: 28765502526178504074319223586816.000000000000000 Train Epoch: 1 [3904/6000 (65%)] Loss: 28876056374530622683087753969664.000000000000000 Train Epoch: 1 [3936/6000 (65%)] Loss: 28989161056362128159414914383872.000000000000000 Train Epoch: 1 [3968/6000 (66%)] Loss: 29099279691419185501680550543360.000000000000000 Train Epoch: 1 [4000/6000 (66%)] Loss: 29207185992226348072556474400768.000000000000000 Train Epoch: 1 [4032/6000 (67%)] Loss: 29316417275731808277007876227072.000000000000000 Train Epoch: 1 [4064/6000 (68%)] Loss: 29426722085365086272166417137664.000000000000000 Train Epoch: 1 [4096/6000 (68%)] Loss: 29536480460527898454937990856704.000000000000000 Train Epoch: 1 [4128/6000 (69%)] Loss: 29645020238464539091501460750336.000000000000000 Train Epoch: 1 [4160/6000 (69%)] Loss: 29751265475195551161891338321920.000000000000000 Train Epoch: 1 [4192/6000 (70%)] Loss: 29852612144505484754865306468352.000000000000000 Train Epoch: 1 [4224/6000 (70%)] Loss: 29954739779894889398286134804480.000000000000000 Train Epoch: 1 [4256/6000 (71%)] Loss: 30056802133290034851731529007104.000000000000000 Train Epoch: 1 [4288/6000 (71%)] Loss: 30161272666917852646492937912320.000000000000000 Train Epoch: 1 [4320/6000 (72%)] Loss: 30266449213224325384692375224320.000000000000000 Train Epoch: 1 [4352/6000 (72%)] Loss: 30372750060543039728024289280000.000000000000000 Train Epoch: 1 [4384/6000 (73%)] Loss: 30477414022301995863453651173376.000000000000000 Train Epoch: 1 [4416/6000 (73%)] Loss: 30582394722625691031726786084864.000000000000000 Train Epoch: 1 [4448/6000 (74%)] Loss: 30654848064846834987425276624896.000000000000000 Train Epoch: 1 [4480/6000 (74%)] Loss: 30742086569841865857930423697408.000000000000000 Train Epoch: 1 [4512/6000 (75%)] Loss: 30825309023264136930317196853248.000000000000000 Train Epoch: 1 [4544/6000 (76%)] Loss: 30902174944726874282503364935680.000000000000000 Train Epoch: 1 [4576/6000 (76%)] Loss: 30978857109465030211054977679360.000000000000000 Train Epoch: 1 [4608/6000 (77%)] Loss: 31067374657977213359226963886080.000000000000000 Train Epoch: 1 [4640/6000 (77%)] Loss: 31142168480585131237007685582848.000000000000000 Train Epoch: 1 [4672/6000 (78%)] Loss: 31218260689523315226522041712640.000000000000000 Train Epoch: 1 [4704/6000 (78%)] Loss: 31297866036893299328418093989888.000000000000000 Train Epoch: 1 [4736/6000 (79%)] Loss: 31377918686816540843108787552256.000000000000000 Train Epoch: 1 [4768/6000 (79%)] Loss: 31456063651796430472961794768896.000000000000000 Train Epoch: 1 [4800/6000 (80%)] Loss: 31532820769935402508522239295488.000000000000000 Train Epoch: 1 [4832/6000 (80%)] Loss: 31605607775682760101872948740096.000000000000000 Train Epoch: 1 [4864/6000 (81%)] Loss: 31686668669739560217295347253248.000000000000000 Train Epoch: 1 [4896/6000 (81%)] Loss: 31774533398309151465712992124928.000000000000000 Train Epoch: 1 [4928/6000 (82%)] Loss: 31858365150344508311203817193472.000000000000000 Train Epoch: 1 [4960/6000 (82%)] Loss: 31939297898264429275933696851968.000000000000000 Train Epoch: 1 [4992/6000 (83%)] Loss: 32019976771762231168536888213504.000000000000000 Train Epoch: 1 [5024/6000 (84%)] Loss: 32098842256530611117378020311040.000000000000000 Train Epoch: 1 [5056/6000 (84%)] Loss: 32177008982175253810556172238848.000000000000000 Train Epoch: 1 [5088/6000 (85%)] Loss: 32252941612905248669019467153408.000000000000000 Train Epoch: 1 [5120/6000 (85%)] Loss: 32326093714250129880380937863168.000000000000000 Train Epoch: 1 [5152/6000 (86%)] Loss: 32395333731642738151733059387392.000000000000000 Train Epoch: 1 [5184/6000 (86%)] Loss: 32462968295546898195541171109888.000000000000000 Train Epoch: 1 [5216/6000 (87%)] Loss: 32529601868872417326392626118656.000000000000000 Train Epoch: 1 [5248/6000 (87%)] Loss: 32595971896369260468083995181056.000000000000000 Train Epoch: 1 [5280/6000 (88%)] Loss: 32664377754946334645305569443840.000000000000000 Train Epoch: 1 [5312/6000 (88%)] Loss: 32731166070776764448691386843136.000000000000000 Train Epoch: 1 [5344/6000 (89%)] Loss: 32794903057838486928040245854208.000000000000000 Train Epoch: 1 [5376/6000 (89%)] Loss: 32856212521854423232006294863872.000000000000000 Train Epoch: 1 [5408/6000 (90%)] Loss: 32925380003697854625607934017536.000000000000000 Train Epoch: 1 [5440/6000 (90%)] Loss: 33006242633920237941845680717824.000000000000000 Train Epoch: 1 [5472/6000 (91%)] Loss: 33067119302492752208567184916480.000000000000000 Train Epoch: 1 [5504/6000 (91%)] Loss: 33126891012866138704223007670272.000000000000000 Train Epoch: 1 [5536/6000 (92%)] Loss: 33183326087977388823356641378304.000000000000000 Train Epoch: 1 [5568/6000 (93%)] Loss: 33236318142354376478600711897088.000000000000000 Train Epoch: 1 [5600/6000 (93%)] Loss: 33286754527548698807769453559808.000000000000000 Train Epoch: 1 [5632/6000 (94%)] Loss: 33335631398435718265302824255488.000000000000000 Train Epoch: 1 [5664/6000 (94%)] Loss: 33374848952024016835730292604928.000000000000000 Train Epoch: 1 [5696/6000 (95%)] Loss: 33409784490359240389620951678976.000000000000000 Train Epoch: 1 [5728/6000 (95%)] Loss: 33441859710205255730884255940608.000000000000000 Train Epoch: 1 [5760/6000 (96%)] Loss: 33470789305068633807034974732288.000000000000000 Train Epoch: 1 [5792/6000 (96%)] Loss: 33502905628392516045690219003904.000000000000000 Train Epoch: 1 [5824/6000 (97%)] Loss: 33536419469963872795671423614976.000000000000000 Train Epoch: 1 [5856/6000 (97%)] Loss: 33571758789522847635706434551808.000000000000000 Train Epoch: 1 [5888/6000 (98%)] Loss: 33610654768843128714773431058432.000000000000000 Train Epoch: 1 [5920/6000 (98%)] Loss: 33650014975678141811443514736640.000000000000000 Train Epoch: 1 [5952/6000 (99%)] Loss: 33691539159730265094336322469888.000000000000000 Train Epoch: 1 [2992/6000 (99%)] Loss: inf Train Epoch: 2 [0/6000 (0%)] Loss: inf Train Epoch: 2 [32/6000 (1%)] Loss: nan Train Epoch: 2 [64/6000 (1%)] Loss: nan Train Epoch: 2 [96/6000 (2%)] Loss: nan Train Epoch: 2 [128/6000 (2%)] Loss: nan Train Epoch: 2 [160/6000 (3%)] Loss: nan Train Epoch: 2 [192/6000 (3%)] Loss: nan Train Epoch: 2 [224/6000 (4%)] Loss: nan Train Epoch: 2 [256/6000 (4%)] Loss: nan Train Epoch: 2 [288/6000 (5%)] Loss: nan Train Epoch: 2 [320/6000 (5%)] Loss: nan Train Epoch: 2 [352/6000 (6%)] Loss: nan Train Epoch: 2 [384/6000 (6%)] Loss: nan Train Epoch: 2 [416/6000 (7%)] Loss: nan Train Epoch: 2 [448/6000 (7%)] Loss: nan Train Epoch: 2 [480/6000 (8%)] Loss: nan Train Epoch: 2 [512/6000 (9%)] Loss: nan Train Epoch: 2 [544/6000 (9%)] Loss: nan Train Epoch: 2 [576/6000 (10%)] Loss: nan Train Epoch: 2 [608/6000 (10%)] Loss: nan Train Epoch: 2 [640/6000 (11%)] Loss: nan Train Epoch: 2 [672/6000 (11%)] Loss: nan Train Epoch: 2 [704/6000 (12%)] Loss: nan Train Epoch: 2 [736/6000 (12%)] Loss: nan Train Epoch: 2 [768/6000 (13%)] Loss: nan Train Epoch: 2 [800/6000 (13%)] Loss: nan Train Epoch: 2 [832/6000 (14%)] Loss: nan Train Epoch: 2 [864/6000 (14%)] Loss: nan Train Epoch: 2 [896/6000 (15%)] Loss: nan Train Epoch: 2 [928/6000 (15%)] Loss: nan Train Epoch: 2 [960/6000 (16%)] Loss: nan Train Epoch: 2 [992/6000 (16%)] Loss: nan Train Epoch: 2 [1024/6000 (17%)] Loss: nan Train Epoch: 2 [1056/6000 (18%)] Loss: nan Train Epoch: 2 [1088/6000 (18%)] Loss: nan ^CProcess Process-14: Process Process-16: Traceback (most recent call last): Traceback (most recent call last): Process Process-13: Process Process-15: Process Process-9: File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap Traceback (most recent call last): Traceback (most recent call last): Traceback (most recent call last): File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, self._kwargs) File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 52, in _worker_loop self.run() File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run self.run() self.run() self.run() self._target(*self._args, *self._kwargs) File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 52, in _worker_loop self._target(self._args, self._kwargs) self._target(*self._args, self._kwargs) self._target(*self._args, *self._kwargs) File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 52, in _worker_loop File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 52, in _worker_loop File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 52, in _worker_loop r = index_queue.get() File "/usr/lib/python2.7/multiprocessing/queues.py", line 378, in get return recv() r = index_queue.get() File "/usr/local/lib/python2.7/dist-packages/torch/multiprocessing/queue.py", line 21, in recv File "/usr/lib/python2.7/multiprocessing/queues.py", line 378, in get r = index_queue.get() r = index_queue.get() r = index_queue.get() File "/usr/lib/python2.7/multiprocessing/queues.py", line 378, in get File "/usr/lib/python2.7/multiprocessing/queues.py", line 378, in get File "/usr/lib/python2.7/multiprocessing/queues.py", line 378, in get buf = self.recv_bytes() KeyboardInterrupt return recv() return recv() File "/usr/local/lib/python2.7/dist-packages/torch/multiprocessing/queue.py", line 21, in recv return recv() File "/usr/local/lib/python2.7/dist-packages/torch/multiprocessing/queue.py", line 21, in recv return recv() File "/usr/local/lib/python2.7/dist-packages/torch/multiprocessing/queue.py", line 21, in recv File "/usr/local/lib/python2.7/dist-packages/torch/multiprocessing/queue.py", line 21, in recv buf = self.recv_bytes() buf = self.recv_bytes() buf = self.recv_bytes() KeyboardInterrupt KeyboardInterrupt KeyboardInterrupt buf = self.recv_bytes() KeyboardInterrupt Process Process-10: Traceback (most recent call last): File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(self._args, self._kwargs) File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 57, in _worker_loop samples = collate_fn([dataset[i] for i in batch_indices]) File "train.py", line 634, in getitem beliefsImg[j] = self.target_transform(beliefsImg[j]) File "/usr/local/lib/python2.7/dist-packages/torchvision/transforms/transforms.py", line 49, in call img = t(img) File "/usr/local/lib/python2.7/dist-packages/torchvision/transforms/transforms.py", line 175, in call return F.resize(img, self.size, self.interpolation) File "/usr/local/lib/python2.7/dist-packages/torchvision/transforms/functional.py", line 204, in resize return img.resize((ow, oh), interpolation) File "/usr/local/lib/python2.7/dist-packages/PIL/Image.py", line 1892, in resize return self._new(self.im.resize(size, resample, box)) KeyboardInterrupt Process Process-12: Traceback (most recent call last): File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, self._kwargs) File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 57, in _worker_loop samples = collate_fn([dataset[i] for i in batch_indices]) File "train.py", line 627, in getitem sigma = self.sigma) File "train.py", line 856, in CreateBeliefMap beliefsImg.append(Image.fromarray((stack255).astype('uint8'))) File "/usr/local/lib/python2.7/dist-packages/PIL/Image.py", line 2666, in fromarray Process Process-11: Traceback (most recent call last): File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(self._args, self._kwargs) File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 57, in _worker_loop obj = obj.tobytes() KeyboardInterrupt samples = collate_fn([dataset[i] for i in batch_indices]) File "train.py", line 627, in getitem sigma = self.sigma) File "train.py", line 856, in CreateBeliefMap beliefsImg.append(Image.fromarray((stack*255).astype('uint8'))) File "/usr/local/lib/python2.7/dist-packages/PIL/Image.py", line 2666, in fromarray obj = obj.tobytes() KeyboardInterrupt Traceback (most recent call last): File "train.py", line 1392, in _runnetwork(epoch,trainingdata) File "train.py", line 1338, in _runnetwork target_belief = Variable(targets['beliefs'].cuda())
KeyboardInterrupt

TontonTremblay commented 5 years ago

Please give it a lower learning rate. I think this might be causing some exploding gradient. I never trained with such a small learning rate though.

qingchenkanlu commented 5 years ago

@TontonTremblay Thanks! I have a try,but the result is awful. if I set the batchsize=16, no matter what the lr is , I will get Train Epoch: 1 [0/6000 (0%)] Loss: 0.039806149899960 Train Epoch: 1 [16/6000 (0%)] Loss: 0.040904726833105 Train Epoch: 1 [32/6000 (1%)] Loss: nan Train Epoch: 1 [48/6000 (1%)] Loss: nan Train Epoch: 1 [64/6000 (1%)] Loss: nan Train Epoch: 1 [80/6000 (1%)] Loss: nan Train Epoch: 1 [96/6000 (2%)] Loss: nan

if I set the batchsize=32, lr=0.00001, I will get yusheng@yusheng:~/code/Deep_Object_Pose-master/scripts$ python train.py --d/home/yusheng/fat/single/004_sugar_box_16k --object 004_sugar_box_16k --outf /home/yusheng/code --gpuids 0 1 --loginterval 1 --batchsize 32 --epochs 100 --lr 0.00001 start: 10:04:11.790992 load data training data: 188 batches load models Training network pretrained on imagenet. Train Epoch: 1 [0/6000 (0%)] Loss: 0.043270923197269 Train Epoch: 1 [32/6000 (1%)] Loss: 0.044183619320393 Train Epoch: 1 [64/6000 (1%)] Loss: 0.044332496821880 Train Epoch: 1 [96/6000 (2%)] Loss: 0.043765094131231 Train Epoch: 1 [128/6000 (2%)] Loss: 0.044279184192419 Train Epoch: 1 [160/6000 (3%)] Loss: 0.043717309832573 Train Epoch: 1 [192/6000 (3%)] Loss: 0.042907532304525 Train Epoch: 1 [224/6000 (4%)] Loss: 0.043558660894632 Train Epoch: 1 [256/6000 (4%)] Loss: 0.043533191084862 Train Epoch: 1 [288/6000 (5%)] Loss: 0.043324362486601 Train Epoch: 1 [320/6000 (5%)] Loss: 0.042893439531326 Train Epoch: 1 [352/6000 (6%)] Loss: 0.042370237410069 Train Epoch: 1 [384/6000 (6%)] Loss: 0.043336097151041 Train Epoch: 1 [416/6000 (7%)] Loss: 0.043230384588242 Train Epoch: 1 [448/6000 (7%)] Loss: 0.043135456740856 Train Epoch: 1 [480/6000 (8%)] Loss: 0.043726902455091 Train Epoch: 1 [512/6000 (9%)] Loss: 0.044741559773684 Train Epoch: 1 [544/6000 (9%)] Loss: 0.046717464923859 Train Epoch: 1 [576/6000 (10%)] Loss: 0.049565110355616 Train Epoch: 1 [608/6000 (10%)] Loss: 0.053250525146723 Train Epoch: 1 [640/6000 (11%)] Loss: 0.059633512049913 Train Epoch: 1 [672/6000 (11%)] Loss: 0.070787280797958

the loss becomes more and more greater

if I set the batchsize=32, lr=0.000001, I will get

yusheng@yusheng:~/code/Deep_Object_Pose-master/scripts$ python train.py --data /home/yusheng/fat/single/004_sugar_box_16k --object 004_sugar_box_16k --outf /home/yusheng/code --gpuids 0 1 --loginterval 1 --batchsize 32 --epochs 100 --lr 0.0000001 start: 09:48:32.386642 load data training data: 188 batches load models Training network pretrained on imagenet. Train Epoch: 1 [0/6000 (0%)] Loss: 0.039792459458113 Train Epoch: 1 [32/6000 (1%)] Loss: 0.040118962526321 Train Epoch: 1 [64/6000 (1%)] Loss: 0.039407569915056 Train Epoch: 1 [96/6000 (2%)] Loss: 0.039502039551735 Train Epoch: 1 [128/6000 (2%)] Loss: 0.039573844522238 Train Epoch: 1 [160/6000 (3%)] Loss: 0.039441470056772 Train Epoch: 1 [192/6000 (3%)] Loss: 0.039882700890303 Train Epoch: 1 [224/6000 (4%)] Loss: 0.039879325777292 Train Epoch: 1 [256/6000 (4%)] Loss: 0.039967175573111 Train Epoch: 1 [288/6000 (5%)] Loss: 0.039692126214504 Train Epoch: 1 [320/6000 (5%)] Loss: 0.039724811911583 Train Epoch: 1 [352/6000 (6%)] Loss: 0.039985574781895 Train Epoch: 1 [384/6000 (6%)] Loss: 0.039775252342224 Train Epoch: 1 [416/6000 (7%)] Loss: 0.039971295744181 Train Epoch: 1 [448/6000 (7%)] Loss: 0.039414189755917 Train Epoch: 1 [480/6000 (8%)] Loss: 0.038949500769377 Train Epoch: 1 [512/6000 (9%)] Loss: 0.039555706083775 Train Epoch: 1 [544/6000 (9%)] Loss: 0.040244515985250 Train Epoch: 1 [576/6000 (10%)] Loss: 0.039147015661001 Train Epoch: 1 [608/6000 (10%)] Loss: 0.039384994655848 Train Epoch: 1 [640/6000 (11%)] Loss: 0.039678554981947 Train Epoch: 1 [672/6000 (11%)] Loss: 0.038915034383535 Train Epoch: 1 [704/6000 (12%)] Loss: 0.039832681417465 Train Epoch: 1 [736/6000 (12%)] Loss: 0.039268180727959 Train Epoch: 1 [768/6000 (13%)] Loss: 0.039623670279980 Train Epoch: 1 [800/6000 (13%)] Loss: 0.039549667388201 Train Epoch: 1 [832/6000 (14%)] Loss: 0.039677392691374 Train Epoch: 1 [864/6000 (14%)] Loss: 0.039333917200565 Train Epoch: 1 [896/6000 (15%)] Loss: 0.038878422230482 Train Epoch: 1 [928/6000 (15%)] Loss: 0.039551887661219 Train Epoch: 1 [960/6000 (16%)] Loss: 0.039639275521040 Train Epoch: 1 [992/6000 (16%)] Loss: 0.038879517465830 Train Epoch: 1 [1024/6000 (17%)] Loss: 0.039388943463564 Train Epoch: 1 [1056/6000 (18%)] Loss: 0.038697890937328 Train Epoch: 1 [1088/6000 (18%)] Loss: 0.039009489119053 Train Epoch: 1 [1120/6000 (19%)] Loss: 0.039670243859291 Train Epoch: 1 [1152/6000 (19%)] Loss: 0.039888095110655 Train Epoch: 1 [1184/6000 (20%)] Loss: 0.039149526506662 Train Epoch: 1 [1216/6000 (20%)] Loss: 0.039394408464432 Train Epoch: 1 [1248/6000 (21%)] Loss: 0.039954561740160 Train Epoch: 1 [1280/6000 (21%)] Loss: 0.039178226143122 Train Epoch: 1 [1312/6000 (22%)] Loss: 0.039544075727463 Train Epoch: 1 [1344/6000 (22%)] Loss: 0.038835100829601 Train Epoch: 1 [1376/6000 (23%)] Loss: 0.039255935698748 Train Epoch: 1 [1408/6000 (23%)] Loss: 0.039672520011663 Train Epoch: 1 [1440/6000 (24%)] Loss: 0.039390802383423 Train Epoch: 1 [1472/6000 (24%)] Loss: 0.039065621793270 Train Epoch: 1 [1504/6000 (25%)] Loss: 0.039490289986134 Train Epoch: 1 [1536/6000 (26%)] Loss: 0.039502024650574 Train Epoch: 1 [1568/6000 (26%)] Loss: 0.039648640900850 Train Epoch: 1 [1600/6000 (27%)] Loss: 0.038826346397400 Train Epoch: 1 [1632/6000 (27%)] Loss: 0.040222078561783 Train Epoch: 1 [1664/6000 (28%)] Loss: 0.039744831621647 Train Epoch: 1 [1696/6000 (28%)] Loss: 0.039847917854786 Train Epoch: 1 [1728/6000 (29%)] Loss: 0.039564594626427 Train Epoch: 1 [1760/6000 (29%)] Loss: 0.039753522723913 Train Epoch: 1 [1792/6000 (30%)] Loss: 0.039603747427464 Train Epoch: 1 [1824/6000 (30%)] Loss: 0.039286565035582 Train Epoch: 1 [1856/6000 (31%)] Loss: 0.040121354162693 Train Epoch: 1 [1888/6000 (31%)] Loss: 0.039083376526833 Train Epoch: 1 [1920/6000 (32%)] Loss: 0.039627272635698 Train Epoch: 1 [1952/6000 (32%)] Loss: 0.039872892200947 Train Epoch: 1 [1984/6000 (33%)] Loss: 0.039126817137003 Train Epoch: 1 [2016/6000 (34%)] Loss: 0.039782438427210 Train Epoch: 1 [2048/6000 (34%)] Loss: 0.039098314940929 Train Epoch: 1 [2080/6000 (35%)] Loss: 0.038823615759611 Train Epoch: 1 [2112/6000 (35%)] Loss: 0.038543082773685 Train Epoch: 1 [2144/6000 (36%)] Loss: 0.039363846182823 Train Epoch: 1 [2176/6000 (36%)] Loss: 0.038813583552837 Train Epoch: 1 [2208/6000 (37%)] Loss: 0.039285171777010 Train Epoch: 1 [2240/6000 (37%)] Loss: 0.039555128663778 Train Epoch: 1 [2272/6000 (38%)] Loss: 0.038546133786440 Train Epoch: 1 [2304/6000 (38%)] Loss: 0.039415068924427 Train Epoch: 1 [2336/6000 (39%)] Loss: 0.039354946464300 Train Epoch: 1 [2368/6000 (39%)] Loss: 0.037824887782335 Train Epoch: 1 [2400/6000 (40%)] Loss: 0.039083443582058 Train Epoch: 1 [2432/6000 (40%)] Loss: 0.038628462702036 Train Epoch: 1 [2464/6000 (41%)] Loss: 0.038720779120922 Train Epoch: 1 [2496/6000 (41%)] Loss: 0.039123993366957 Train Epoch: 1 [2528/6000 (42%)] Loss: 0.038303773850203 Train Epoch: 1 [2560/6000 (43%)] Loss: 0.039466693997383 Train Epoch: 1 [2592/6000 (43%)] Loss: 0.039330996572971 Train Epoch: 1 [2624/6000 (44%)] Loss: 0.039526633918285 Train Epoch: 1 [2656/6000 (44%)] Loss: 0.039429619908333 Train Epoch: 1 [2688/6000 (45%)] Loss: 0.038515370339155 Train Epoch: 1 [2720/6000 (45%)] Loss: 0.039038509130478 Train Epoch: 1 [2752/6000 (46%)] Loss: 0.039503857493401 Train Epoch: 1 [2784/6000 (46%)] Loss: 0.039593078196049 Train Epoch: 1 [2816/6000 (47%)] Loss: 0.039287623018026 Train Epoch: 1 [2848/6000 (47%)] Loss: 0.038541708141565 Train Epoch: 1 [2880/6000 (48%)] Loss: 0.039698366075754 Train Epoch: 1 [2912/6000 (48%)] Loss: 0.039301846176386 Train Epoch: 1 [2944/6000 (49%)] Loss: 0.039029195904732 Train Epoch: 1 [2976/6000 (49%)] Loss: 0.037795785814524 Train Epoch: 1 [3008/6000 (50%)] Loss: 0.039612688124180 Train Epoch: 1 [3040/6000 (51%)] Loss: 0.039189293980598 Train Epoch: 1 [3072/6000 (51%)] Loss: 0.039452623575926 Train Epoch: 1 [3104/6000 (52%)] Loss: 0.039738763123751 Train Epoch: 1 [3136/6000 (52%)] Loss: 0.039523031562567 Train Epoch: 1 [3168/6000 (53%)] Loss: 0.039347819983959 Train Epoch: 1 [3200/6000 (53%)] Loss: 0.039446640759706 Train Epoch: 1 [3232/6000 (54%)] Loss: 0.038627114146948 Train Epoch: 1 [3264/6000 (54%)] Loss: 0.039033696055412 Train Epoch: 1 [3296/6000 (55%)] Loss: 0.038902185857296 Train Epoch: 1 [3328/6000 (55%)] Loss: 0.037342499941587 Train Epoch: 1 [3360/6000 (56%)] Loss: 0.038826677948236 Train Epoch: 1 [3392/6000 (56%)] Loss: 0.039014045149088 Train Epoch: 1 [3424/6000 (57%)] Loss: 0.038335196673870 Train Epoch: 1 [3456/6000 (57%)] Loss: 0.038590025156736 Train Epoch: 1 [3488/6000 (58%)] Loss: 0.038569774478674 Train Epoch: 1 [3520/6000 (59%)] Loss: 0.037795089185238 Train Epoch: 1 [3552/6000 (59%)] Loss: 0.038767546415329 Train Epoch: 1 [3584/6000 (60%)] Loss: 0.039650548249483 Train Epoch: 1 [3616/6000 (60%)] Loss: 0.039484128355980 Train Epoch: 1 [3648/6000 (61%)] Loss: 0.039007365703583 Train Epoch: 1 [3680/6000 (61%)] Loss: 0.039138562977314 Train Epoch: 1 [3712/6000 (62%)] Loss: 0.039442181587219 Train Epoch: 1 [3744/6000 (62%)] Loss: 0.039238926023245 Train Epoch: 1 [3776/6000 (63%)] Loss: 0.039852343499660 Train Epoch: 1 [3808/6000 (63%)] Loss: 0.039278157055378 Train Epoch: 1 [3840/6000 (64%)] Loss: 0.038127128034830 Train Epoch: 1 [3872/6000 (64%)] Loss: 0.039125993847847 Train Epoch: 1 [3904/6000 (65%)] Loss: 0.038395628333092 Train Epoch: 1 [3936/6000 (65%)] Loss: 0.038797646760941 Train Epoch: 1 [3968/6000 (66%)] Loss: 0.038971956819296 Train Epoch: 1 [4000/6000 (66%)] Loss: 0.038657538592815 Train Epoch: 1 [4032/6000 (67%)] Loss: 0.038651499897242 Train Epoch: 1 [4064/6000 (68%)] Loss: 0.039005886763334 Train Epoch: 1 [4096/6000 (68%)] Loss: 0.039596915245056 Train Epoch: 1 [4128/6000 (69%)] Loss: 0.038991905748844 Train Epoch: 1 [4160/6000 (69%)] Loss: 0.039658430963755 Train Epoch: 1 [4192/6000 (70%)] Loss: 0.039388433098793 Train Epoch: 1 [4224/6000 (70%)] Loss: 0.039430785924196 Train Epoch: 1 [4256/6000 (71%)] Loss: 0.039087638258934 Train Epoch: 1 [4288/6000 (71%)] Loss: 0.039567634463310 Train Epoch: 1 [4320/6000 (72%)] Loss: 0.039260476827621 Train Epoch: 1 [4352/6000 (72%)] Loss: 0.039172463119030 Train Epoch: 1 [4384/6000 (73%)] Loss: 0.039328265935183 Train Epoch: 1 [4416/6000 (73%)] Loss: 0.039464030414820 Train Epoch: 1 [4448/6000 (74%)] Loss: 0.039172913879156 Train Epoch: 1 [4480/6000 (74%)] Loss: 0.038808688521385 Train Epoch: 1 [4512/6000 (75%)] Loss: 0.038383085280657 Train Epoch: 1 [4544/6000 (76%)] Loss: 0.039288282394409 Train Epoch: 1 [4576/6000 (76%)] Loss: 0.038566451519728 Train Epoch: 1 [4608/6000 (77%)] Loss: 0.039688162505627 Train Epoch: 1 [4640/6000 (77%)] Loss: 0.039215028285980 Train Epoch: 1 [4672/6000 (78%)] Loss: 0.040023680776358 Train Epoch: 1 [4704/6000 (78%)] Loss: 0.040085170418024 Train Epoch: 1 [4736/6000 (79%)] Loss: 0.039370507001877 Train Epoch: 1 [4768/6000 (79%)] Loss: 0.038874715566635 Train Epoch: 1 [4800/6000 (80%)] Loss: 0.039609316736460 Train Epoch: 1 [4832/6000 (80%)] Loss: 0.039922907948494 Train Epoch: 1 [4864/6000 (81%)] Loss: 0.039250761270523 Train Epoch: 1 [4896/6000 (81%)] Loss: 0.039386719465256 Train Epoch: 1 [4928/6000 (82%)] Loss: 0.039294518530369 Train Epoch: 1 [4960/6000 (82%)] Loss: 0.039232362061739 Train Epoch: 1 [4992/6000 (83%)] Loss: 0.038829851895571 Train Epoch: 1 [5024/6000 (84%)] Loss: 0.040068089962006 Train Epoch: 1 [5056/6000 (84%)] Loss: 0.038556259125471 Train Epoch: 1 [5088/6000 (85%)] Loss: 0.039061900228262 Train Epoch: 1 [5120/6000 (85%)] Loss: 0.039017625153065 Train Epoch: 1 [5152/6000 (86%)] Loss: 0.039195843040943 Train Epoch: 1 [5184/6000 (86%)] Loss: 0.039371736347675 Train Epoch: 1 [5216/6000 (87%)] Loss: 0.039622940123081 Train Epoch: 1 [5248/6000 (87%)] Loss: 0.039405468851328 Train Epoch: 1 [5280/6000 (88%)] Loss: 0.039143007248640 Train Epoch: 1 [5312/6000 (88%)] Loss: 0.038862146437168 Train Epoch: 1 [5344/6000 (89%)] Loss: 0.039052642881870 Train Epoch: 1 [5376/6000 (89%)] Loss: 0.039885848760605 Train Epoch: 1 [5408/6000 (90%)] Loss: 0.039920128881931 Train Epoch: 1 [5440/6000 (90%)] Loss: 0.039013963192701 Train Epoch: 1 [5472/6000 (91%)] Loss: 0.039649326354265 Train Epoch: 1 [5504/6000 (91%)] Loss: 0.039090774953365 Train Epoch: 1 [5536/6000 (92%)] Loss: 0.039794616401196 Train Epoch: 1 [5568/6000 (93%)] Loss: 0.038610260933638 Train Epoch: 1 [5600/6000 (93%)] Loss: 0.040187131613493 Train Epoch: 1 [5632/6000 (94%)] Loss: 0.038806866854429 Train Epoch: 1 [5664/6000 (94%)] Loss: 0.039151869714260 Train Epoch: 1 [5696/6000 (95%)] Loss: 0.038420237600803 Train Epoch: 1 [5728/6000 (95%)] Loss: 0.039011366665363 Train Epoch: 1 [5760/6000 (96%)] Loss: 0.039204902946949 Train Epoch: 1 [5792/6000 (96%)] Loss: 0.039767667651176 Train Epoch: 1 [5824/6000 (97%)] Loss: 0.039062201976776 Train Epoch: 1 [5856/6000 (97%)] Loss: 0.039488255977631 Train Epoch: 1 [5888/6000 (98%)] Loss: 0.038808897137642 Train Epoch: 1 [5920/6000 (98%)] Loss: 0.039776969701052 Train Epoch: 1 [5952/6000 (99%)] Loss: 0.039126191288233 Train Epoch: 1 [2992/6000 (99%)] Loss: 0.040021128952503 Train Epoch: 2 [0/6000 (0%)] Loss: 0.038528908044100 Train Epoch: 2 [32/6000 (1%)] Loss: 0.038958810269833 Train Epoch: 2 [64/6000 (1%)] Loss: 0.039866823703051 Train Epoch: 2 [96/6000 (2%)] Loss: 0.038811642676592 Train Epoch: 2 [128/6000 (2%)] Loss: 0.039687812328339 Train Epoch: 2 [160/6000 (3%)] Loss: 0.040034938603640 Train Epoch: 2 [192/6000 (3%)] Loss: 0.038796894252300 Train Epoch: 2 [224/6000 (4%)] Loss: 0.039100918918848 Train Epoch: 2 [256/6000 (4%)] Loss: 0.039588838815689 Train Epoch: 2 [288/6000 (5%)] Loss: 0.039485577493906 Train Epoch: 2 [320/6000 (5%)] Loss: 0.039184581488371 Train Epoch: 2 [352/6000 (6%)] Loss: 0.039723508059978 Train Epoch: 2 [384/6000 (6%)] Loss: 0.040143560618162 Train Epoch: 2 [416/6000 (7%)] Loss: 0.038707613945007 Train Epoch: 2 [448/6000 (7%)] Loss: 0.039807248860598 Train Epoch: 2 [480/6000 (8%)] Loss: 0.039163611829281 Train Epoch: 2 [512/6000 (9%)] Loss: 0.039052747189999 Train Epoch: 2 [544/6000 (9%)] Loss: 0.039370529353619 Train Epoch: 2 [576/6000 (10%)] Loss: 0.038817286491394 Train Epoch: 2 [608/6000 (10%)] Loss: 0.039521843194962 Train Epoch: 2 [640/6000 (11%)] Loss: 0.039261516183615 Train Epoch: 2 [672/6000 (11%)] Loss: 0.039800044149160 Train Epoch: 2 [704/6000 (12%)] Loss: 0.039425734430552 Train Epoch: 2 [736/6000 (12%)] Loss: 0.039655059576035 Train Epoch: 2 [768/6000 (13%)] Loss: 0.039268296211958 Train Epoch: 2 [800/6000 (13%)] Loss: 0.039472226053476 Train Epoch: 2 [832/6000 (14%)] Loss: 0.039561267942190 Train Epoch: 2 [864/6000 (14%)] Loss: 0.039211131632328 Train Epoch: 2 [896/6000 (15%)] Loss: 0.039399906992912 Train Epoch: 2 [928/6000 (15%)] Loss: 0.039686974138021 Train Epoch: 2 [960/6000 (16%)] Loss: 0.039483301341534 Train Epoch: 2 [992/6000 (16%)] Loss: 0.038463480770588 Train Epoch: 2 [1024/6000 (17%)] Loss: 0.039319414645433 Train Epoch: 2 [1056/6000 (18%)] Loss: 0.039556305855513 Train Epoch: 2 [1088/6000 (18%)] Loss: 0.039204359054565 Train Epoch: 2 [1120/6000 (19%)] Loss: 0.038710456341505 Train Epoch: 2 [1152/6000 (19%)] Loss: 0.039336349815130

the loss seems doesn't converge or the converge is too slow. I only have train one epoch ,is the training time is not enough?

if I set batchsize=64, I will get start: 10:16:33.973653 load data training data: 94 batches load models Training network pretrained on imagenet. THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCStorage.cu line=58 error=2 : out of memory Traceback (most recent call last): File "train.py", line 1392, in _runnetwork(epoch,trainingdata) File "train.py", line 1334, in _runnetwork output_belief, output_affinities = net(data) File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 491, in call result = self.forward(*input, **kwargs) File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/data_parallel.py", line 114, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/data_parallel.py", line 124, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/parallel_apply.py", line 65, in parallel_apply raise output RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/generic/THCStorage.cu:58

others seems get same problem. I have two nvidia gtx 1080ti 11GB

TontonTremblay commented 5 years ago

Can you remove the --object argument? I normally pass just sugar to train on it, but since you are training on single dataset, this will make sure it loads all the data.

qingchenkanlu commented 5 years ago

It does work , the loss finally become more and more great T_T

yusheng@yusheng:~/code/Deep_Object_Pose-master/scripts$ python train.py --data /home/yusheng/fat/single/004_sugar_box_16k --outf /home/yusheng/code --gpuids 0 1 --loginterval 1 --batchsize 32 --epochs 100 --lr 0.000001 start: 10:58:18.842664 load data training data: 188 batches load models Training network pretrained on imagenet. Train Epoch: 1 [0/6000 (0%)] Loss: 0.038804907351732 Train Epoch: 1 [32/6000 (1%)] Loss: 0.039279893040657 Train Epoch: 1 [64/6000 (1%)] Loss: 0.038769133388996 Train Epoch: 1 [96/6000 (2%)] Loss: 0.039265148341656 Train Epoch: 1 [128/6000 (2%)] Loss: 0.038670811802149 Train Epoch: 1 [160/6000 (3%)] Loss: 0.039157230407000 Train Epoch: 1 [192/6000 (3%)] Loss: 0.038940127938986 Train Epoch: 1 [224/6000 (4%)] Loss: 0.038488056510687 Train Epoch: 1 [256/6000 (4%)] Loss: 0.038359414786100 Train Epoch: 1 [288/6000 (5%)] Loss: 0.038298018276691 Train Epoch: 1 [320/6000 (5%)] Loss: 0.038498181849718 Train Epoch: 1 [352/6000 (6%)] Loss: 0.038484472781420 Train Epoch: 1 [384/6000 (6%)] Loss: 0.038467951118946 Train Epoch: 1 [416/6000 (7%)] Loss: 0.039112605154514 Train Epoch: 1 [448/6000 (7%)] Loss: 0.038414534181356 Train Epoch: 1 [480/6000 (8%)] Loss: 0.038314715027809 Train Epoch: 1 [512/6000 (9%)] Loss: 0.038475196808577 Train Epoch: 1 [544/6000 (9%)] Loss: 0.038775559514761 Train Epoch: 1 [576/6000 (10%)] Loss: 0.038869984447956 Train Epoch: 1 [608/6000 (10%)] Loss: 0.038977708667517 Train Epoch: 1 [640/6000 (11%)] Loss: 0.038242880254984 Train Epoch: 1 [672/6000 (11%)] Loss: 0.039328057318926 Train Epoch: 1 [704/6000 (12%)] Loss: 0.038632005453110 Train Epoch: 1 [736/6000 (12%)] Loss: 0.038621481508017 Train Epoch: 1 [768/6000 (13%)] Loss: 0.039162911474705 Train Epoch: 1 [800/6000 (13%)] Loss: 0.038663722574711 Train Epoch: 1 [832/6000 (14%)] Loss: 0.038170833140612 Train Epoch: 1 [864/6000 (14%)] Loss: 0.038872286677361 Train Epoch: 1 [896/6000 (15%)] Loss: 0.038306020200253 Train Epoch: 1 [928/6000 (15%)] Loss: 0.038344457745552 Train Epoch: 1 [960/6000 (16%)] Loss: 0.038256756961346 Train Epoch: 1 [992/6000 (16%)] Loss: 0.038965933024883 Train Epoch: 1 [1024/6000 (17%)] Loss: 0.038473363965750 Train Epoch: 1 [1056/6000 (18%)] Loss: 0.039064638316631 Train Epoch: 1 [1088/6000 (18%)] Loss: 0.038860060274601 Train Epoch: 1 [1120/6000 (19%)] Loss: 0.038768216967583 Train Epoch: 1 [1152/6000 (19%)] Loss: 0.039159547537565 Train Epoch: 1 [1184/6000 (20%)] Loss: 0.038528427481651 Train Epoch: 1 [1216/6000 (20%)] Loss: 0.039043493568897 Train Epoch: 1 [1248/6000 (21%)] Loss: 0.038629002869129 Train Epoch: 1 [1280/6000 (21%)] Loss: 0.038930796086788 Train Epoch: 1 [1312/6000 (22%)] Loss: 0.037965558469296 Train Epoch: 1 [1344/6000 (22%)] Loss: 0.038816198706627 Train Epoch: 1 [1376/6000 (23%)] Loss: 0.037802442908287 Train Epoch: 1 [1408/6000 (23%)] Loss: 0.039175182580948 Train Epoch: 1 [1440/6000 (24%)] Loss: 0.038710270076990 Train Epoch: 1 [1472/6000 (24%)] Loss: 0.038385808467865 Train Epoch: 1 [1504/6000 (25%)] Loss: 0.038630131632090 Train Epoch: 1 [1536/6000 (26%)] Loss: 0.038291595876217 Train Epoch: 1 [1568/6000 (26%)] Loss: 0.038497447967529 Train Epoch: 1 [1600/6000 (27%)] Loss: 0.037885788828135 Train Epoch: 1 [1632/6000 (27%)] Loss: 0.038541838526726 Train Epoch: 1 [1664/6000 (28%)] Loss: 0.038069501519203 Train Epoch: 1 [1696/6000 (28%)] Loss: 0.038520775735378 Train Epoch: 1 [1728/6000 (29%)] Loss: 0.038348115980625 Train Epoch: 1 [1760/6000 (29%)] Loss: 0.038655046373606 Train Epoch: 1 [1792/6000 (30%)] Loss: 0.038718141615391 Train Epoch: 1 [1824/6000 (30%)] Loss: 0.038416977971792 Train Epoch: 1 [1856/6000 (31%)] Loss: 0.038400139659643 Train Epoch: 1 [1888/6000 (31%)] Loss: 0.038309749215841 Train Epoch: 1 [1920/6000 (32%)] Loss: 0.038490708917379 Train Epoch: 1 [1952/6000 (32%)] Loss: 0.037472035735846 Train Epoch: 1 [1984/6000 (33%)] Loss: 0.037404563277960 Train Epoch: 1 [2016/6000 (34%)] Loss: 0.038348663598299 Train Epoch: 1 [2048/6000 (34%)] Loss: 0.038518354296684 Train Epoch: 1 [2080/6000 (35%)] Loss: 0.038658063858747 Train Epoch: 1 [2112/6000 (35%)] Loss: 0.038228459656239 Train Epoch: 1 [2144/6000 (36%)] Loss: 0.037489473819733 Train Epoch: 1 [2176/6000 (36%)] Loss: 0.038621716201305 Train Epoch: 1 [2208/6000 (37%)] Loss: 0.038844883441925 Train Epoch: 1 [2240/6000 (37%)] Loss: 0.037892714142799 Train Epoch: 1 [2272/6000 (38%)] Loss: 0.038469776511192 Train Epoch: 1 [2304/6000 (38%)] Loss: 0.037553507834673 Train Epoch: 1 [2336/6000 (39%)] Loss: 0.036701112985611 Train Epoch: 1 [2368/6000 (39%)] Loss: 0.037838775664568 Train Epoch: 1 [2400/6000 (40%)] Loss: 0.037400305271149 Train Epoch: 1 [2432/6000 (40%)] Loss: 0.037809796631336 Train Epoch: 1 [2464/6000 (41%)] Loss: 0.037069659680128 Train Epoch: 1 [2496/6000 (41%)] Loss: 0.037373892962933 Train Epoch: 1 [2528/6000 (42%)] Loss: 0.037432190030813 Train Epoch: 1 [2560/6000 (43%)] Loss: 0.037757351994514 Train Epoch: 1 [2592/6000 (43%)] Loss: 0.038000416010618 Train Epoch: 1 [2624/6000 (44%)] Loss: 0.037836816161871 Train Epoch: 1 [2656/6000 (44%)] Loss: 0.037931758910418 Train Epoch: 1 [2688/6000 (45%)] Loss: 0.038122601807117 Train Epoch: 1 [2720/6000 (45%)] Loss: 0.037516698241234 Train Epoch: 1 [2752/6000 (46%)] Loss: 0.037454459816217 Train Epoch: 1 [2784/6000 (46%)] Loss: 0.037578023970127 Train Epoch: 1 [2816/6000 (47%)] Loss: 0.037274461239576 Train Epoch: 1 [2848/6000 (47%)] Loss: 0.037473537027836 Train Epoch: 1 [2880/6000 (48%)] Loss: 0.037590529769659 Train Epoch: 1 [2912/6000 (48%)] Loss: 0.037465430796146 Train Epoch: 1 [2944/6000 (49%)] Loss: 0.037472534924746 Train Epoch: 1 [2976/6000 (49%)] Loss: 0.036587923765182 Train Epoch: 1 [3008/6000 (50%)] Loss: 0.037454381585121 Train Epoch: 1 [3040/6000 (51%)] Loss: 0.036157384514809 Train Epoch: 1 [3072/6000 (51%)] Loss: 0.036983564496040 Train Epoch: 1 [3104/6000 (52%)] Loss: 0.037603929638863 Train Epoch: 1 [3136/6000 (52%)] Loss: 0.037403929978609 Train Epoch: 1 [3168/6000 (53%)] Loss: 0.037616200745106 Train Epoch: 1 [3200/6000 (53%)] Loss: 0.038681957870722 Train Epoch: 1 [3232/6000 (54%)] Loss: 0.037830557674170 Train Epoch: 1 [3264/6000 (54%)] Loss: 0.038075610995293 Train Epoch: 1 [3296/6000 (55%)] Loss: 0.038297295570374 Train Epoch: 1 [3328/6000 (55%)] Loss: 0.038108751177788 Train Epoch: 1 [3360/6000 (56%)] Loss: 0.039164479821920 Train Epoch: 1 [3392/6000 (56%)] Loss: 0.038122862577438 Train Epoch: 1 [3424/6000 (57%)] Loss: 0.039392814040184 Train Epoch: 1 [3456/6000 (57%)] Loss: 0.039377801120281 Train Epoch: 1 [3488/6000 (58%)] Loss: 0.039508718997240 Train Epoch: 1 [3520/6000 (59%)] Loss: 0.039820171892643 Train Epoch: 1 [3552/6000 (59%)] Loss: 0.040079027414322 Train Epoch: 1 [3584/6000 (60%)] Loss: 0.040030352771282 Train Epoch: 1 [3616/6000 (60%)] Loss: 0.040581800043583 Train Epoch: 1 [3648/6000 (61%)] Loss: 0.040247868746519 Train Epoch: 1 [3680/6000 (61%)] Loss: 0.041205402463675 Train Epoch: 1 [3712/6000 (62%)] Loss: 0.041648313403130 Train Epoch: 1 [3744/6000 (62%)] Loss: 0.043104920536280 Train Epoch: 1 [3776/6000 (63%)] Loss: 0.043503127992153 Train Epoch: 1 [3808/6000 (63%)] Loss: 0.044106781482697 Train Epoch: 1 [3840/6000 (64%)] Loss: 0.045341581106186 Train Epoch: 1 [3872/6000 (64%)] Loss: 0.046855259686708 Train Epoch: 1 [3904/6000 (65%)] Loss: 0.047551192343235 Train Epoch: 1 [3936/6000 (65%)] Loss: 0.048702210187912 Train Epoch: 1 [3968/6000 (66%)] Loss: 0.049821175634861 Train Epoch: 1 [4000/6000 (66%)] Loss: 0.051127173006535 Train Epoch: 1 [4032/6000 (67%)] Loss: 0.053120616823435 Train Epoch: 1 [4064/6000 (68%)] Loss: 0.054369080811739 Train Epoch: 1 [4096/6000 (68%)] Loss: 0.055158466100693 Train Epoch: 1 [4128/6000 (69%)] Loss: 0.058482877910137 Train Epoch: 1 [4160/6000 (69%)] Loss: 0.060580302029848 Train Epoch: 1 [4192/6000 (70%)] Loss: 0.062541224062443 Train Epoch: 1 [4224/6000 (70%)] Loss: 0.064235262572765 Train Epoch: 1 [4256/6000 (71%)] Loss: 0.068264707922935 Train Epoch: 1 [4288/6000 (71%)] Loss: 0.071094162762165 Train Epoch: 1 [4320/6000 (72%)] Loss: 0.076019123196602 Train Epoch: 1 [4352/6000 (72%)] Loss: 0.080142162740231 Train Epoch: 1 [4384/6000 (73%)] Loss: 0.085152924060822 Train Epoch: 1 [4416/6000 (73%)] Loss: 0.090455226600170 Train Epoch: 1 [4448/6000 (74%)] Loss: 0.096430420875549 Train Epoch: 1 [4480/6000 (74%)] Loss: 0.103046029806137 Train Epoch: 1 [4512/6000 (75%)] Loss: 0.111310094594955 Train Epoch: 1 [4544/6000 (76%)] Loss: 0.119044937193394 Train Epoch: 1 [4576/6000 (76%)] Loss: 0.127564713358879 Train Epoch: 1 [4608/6000 (77%)] Loss: 0.137142091989517 Train Epoch: 1 [4640/6000 (77%)] Loss: 0.148854136466980 Train Epoch: 1 [4672/6000 (78%)] Loss: 0.162594139575958 Train Epoch: 1 [4704/6000 (78%)] Loss: 0.175815209746361 Train Epoch: 1 [4736/6000 (79%)] Loss: 0.192956775426865 Train Epoch: 1 [4768/6000 (79%)] Loss: 0.212724372744560 Train Epoch: 1 [4800/6000 (80%)] Loss: 0.231808453798294 Train Epoch: 1 [4832/6000 (80%)] Loss: 0.253963381052017 Train Epoch: 1 [4864/6000 (81%)] Loss: 0.280495226383209 Train Epoch: 1 [4896/6000 (81%)] Loss: 0.310789674520493 Train Epoch: 1 [4928/6000 (82%)] Loss: 0.341103762388229 Train Epoch: 1 [4960/6000 (82%)] Loss: 0.380992770195007 Train Epoch: 1 [4992/6000 (83%)] Loss: 0.425296336412430 Train Epoch: 1 [5024/6000 (84%)] Loss: 0.474648237228394 Train Epoch: 1 [5056/6000 (84%)] Loss: 0.532463312149048 Train Epoch: 1 [5088/6000 (85%)] Loss: 0.587903797626495 Train Epoch: 1 [5120/6000 (85%)] Loss: 0.655122399330139 Train Epoch: 1 [5152/6000 (86%)] Loss: 0.737766683101654 Train Epoch: 1 [5184/6000 (86%)] Loss: 0.834491610527039

qingchenkanlu commented 5 years ago

@TontonTremblay Thanks, I use another computer, and the training is work, but I don't know what has caused this.

qingchenkanlu commented 5 years ago

I have found what the problem is , if I use only one GPU,it can converge

yusheng@yusheng:~/Deep_Object_Pose-master/scripts$ python train.py --data /home/yusheng/fat/single/004_sugar_box_16k --outf /home/yusheng/Deep_Object_Pose-master --loginterval 1 --batchsize 16 --lr 0.0005 start: 19:51:59.962850 load data training data: 375 batches load models Training network pretrained on imagenet. Train Epoch: 1 [0/6000 (0%)] Loss: 0.040953978896141 Train Epoch: 1 [16/6000 (0%)] Loss: 0.034083966165781 Train Epoch: 1 [32/6000 (1%)] Loss: 0.027863219380379 Train Epoch: 1 [48/6000 (1%)] Loss: 0.052056580781937 Train Epoch: 1 [64/6000 (1%)] Loss: 0.025588771328330 Train Epoch: 1 [80/6000 (1%)] Loss: 0.023547889664769 Train Epoch: 1 [96/6000 (2%)] Loss: 0.027019804343581 Train Epoch: 1 [112/6000 (2%)] Loss: 0.020110279321671 Train Epoch: 1 [128/6000 (2%)] Loss: 0.018488207831979 Train Epoch: 1 [144/6000 (2%)] Loss: 0.017858909443021 Train Epoch: 1 [160/6000 (3%)] Loss: 0.016017368063331 Train Epoch: 1 [176/6000 (3%)] Loss: 0.014744239859283 Train Epoch: 1 [192/6000 (3%)] Loss: 0.013830945827067 Train Epoch: 1 [208/6000 (3%)] Loss: 0.018945587798953 Train Epoch: 1 [224/6000 (4%)] Loss: 0.013751089572906 Train Epoch: 1 [240/6000 (4%)] Loss: 0.013395770452917 Train Epoch: 1 [256/6000 (4%)] Loss: 0.013925779610872 Train Epoch: 1 [272/6000 (5%)] Loss: 0.012055069208145 Train Epoch: 1 [288/6000 (5%)] Loss: 0.012488808482885 Train Epoch: 1 [304/6000 (5%)] Loss: 0.013699783012271 Train Epoch: 1 [320/6000 (5%)] Loss: 0.012231723405421 Train Epoch: 1 [336/6000 (6%)] Loss: 0.011017498560250 Train Epoch: 1 [352/6000 (6%)] Loss: 0.011313261464238 Train Epoch: 1 [368/6000 (6%)] Loss: 0.011298383586109 Train Epoch: 1 [384/6000 (6%)] Loss: 0.012755277566612 Train Epoch: 1 [400/6000 (7%)] Loss: 0.012266611680388 Train Epoch: 1 [416/6000 (7%)] Loss: 0.011365349404514 Train Epoch: 1 [432/6000 (7%)] Loss: 0.012122785672545 Train Epoch: 1 [448/6000 (7%)] Loss: 0.031905010342598 Train Epoch: 1 [464/6000 (8%)] Loss: 0.012502704747021 Train Epoch: 1 [480/6000 (8%)] Loss: 0.012813069857657 Train Epoch: 1 [496/6000 (8%)] Loss: 0.012356782332063 Train Epoch: 1 [512/6000 (9%)] Loss: 0.014413649216294 Train Epoch: 1 [528/6000 (9%)] Loss: 0.014801333658397 Train Epoch: 1 [544/6000 (9%)] Loss: 0.013789370656013 Train Epoch: 1 [560/6000 (9%)] Loss: 0.014443821273744 Train Epoch: 1 [576/6000 (10%)] Loss: 0.014905676245689 Train Epoch: 1 [592/6000 (10%)] Loss: 0.014864686876535 Train Epoch: 1 [608/6000 (10%)] Loss: 0.014638021588326 Train Epoch: 1 [624/6000 (10%)] Loss: 0.014538430608809 Train Epoch: 1 [640/6000 (11%)] Loss: 0.012766372412443 Train Epoch: 1 [656/6000 (11%)] Loss: 0.013938860967755 Train Epoch: 1 [672/6000 (11%)] Loss: 0.014381136745214 Train Epoch: 1 [688/6000 (11%)] Loss: 0.013379598967731 Train Epoch: 1 [704/6000 (12%)] Loss: 0.012310686521232 Train Epoch: 1 [720/6000 (12%)] Loss: 0.014388881623745 Train Epoch: 1 [736/6000 (12%)] Loss: 0.012809935025871 Train Epoch: 1 [752/6000 (13%)] Loss: 0.013151279650629 Train Epoch: 1 [768/6000 (13%)] Loss: 0.013079827651381 Train Epoch: 1 [784/6000 (13%)] Loss: 0.012990090996027 Train Epoch: 1 [800/6000 (13%)] Loss: 0.012783104553819 Train Epoch: 1 [816/6000 (14%)] Loss: 0.013732401654124 Train Epoch: 1 [832/6000 (14%)] Loss: 0.012417307123542 Train Epoch: 1 [848/6000 (14%)] Loss: 0.012330837547779 Train Epoch: 1 [864/6000 (14%)] Loss: 0.011554071679711 Train Epoch: 1 [880/6000 (15%)] Loss: 0.012453998439014 Train Epoch: 1 [896/6000 (15%)] Loss: 0.012853015214205 Train Epoch: 1 [912/6000 (15%)] Loss: 0.012630909681320 Train Epoch: 1 [928/6000 (15%)] Loss: 0.012449970468879 Train Epoch: 1 [944/6000 (16%)] Loss: 0.011639437638223 Train Epoch: 1 [960/6000 (16%)] Loss: 0.012345942668617 Train Epoch: 1 [976/6000 (16%)] Loss: 0.012388405390084 Train Epoch: 1 [992/6000 (17%)] Loss: 0.012444682419300 Train Epoch: 1 [1008/6000 (17%)] Loss: 0.011106600984931 Train Epoch: 1 [1024/6000 (17%)] Loss: 0.010832753032446 Train Epoch: 1 [1040/6000 (17%)] Loss: 0.010026946663857 Train Epoch: 1 [1056/6000 (18%)] Loss: 0.009827477857471 Train Epoch: 1 [1072/6000 (18%)] Loss: 0.010986939072609 Train Epoch: 1 [1088/6000 (18%)] Loss: 0.009983441792428 Train Epoch: 1 [1104/6000 (18%)] Loss: 0.009713599458337 Train Epoch: 1 [1120/6000 (19%)] Loss: 0.009522896260023 Train Epoch: 1 [1136/6000 (19%)] Loss: 0.010080872103572 Train Epoch: 1 [1152/6000 (19%)] Loss: 0.010227524675429 Train Epoch: 1 [1168/6000 (19%)] Loss: 0.010864577256143 Train Epoch: 1 [1184/6000 (20%)] Loss: 0.010463309474289 Train Epoch: 1 [1200/6000 (20%)] Loss: 0.009631238877773 Train Epoch: 1 [1216/6000 (20%)] Loss: 0.009731948375702 Train Epoch: 1 [1232/6000 (21%)] Loss: 0.009563428349793 Train Epoch: 1 [1248/6000 (21%)] Loss: 0.010449653491378 Train Epoch: 1 [1264/6000 (21%)] Loss: 0.009839917533100 Train Epoch: 1 [1280/6000 (21%)] Loss: 0.009463611058891 Train Epoch: 1 [1296/6000 (22%)] Loss: 0.010974670760334 Train Epoch: 1 [1312/6000 (22%)] Loss: 0.011161041446030 Train Epoch: 1 [1328/6000 (22%)] Loss: 0.010570807382464 Train Epoch: 1 [1344/6000 (22%)] Loss: 0.010968651622534 Train Epoch: 1 [1360/6000 (23%)] Loss: 0.009682252071798

if I use two, it can't converge

yusheng@yusheng:~/Deep_Object_Pose-master/scripts$ python train.py --data /home/yusheng/fat/single/004_sugar_box_16k --outf /home/yusheng/Deep_Object_Pose-master --loginterval 1 --batchsize 16 --lr 0.0005 --gpuids 0 1 start: 19:53:56.836200 load data training data: 375 batches load models Training network pretrained on imagenet. Train Epoch: 1 [0/6000 (0%)] Loss: 0.040255680680275 Train Epoch: 1 [16/6000 (0%)] Loss: 1.918457865715027 Train Epoch: 1 [32/6000 (1%)] Loss: 0.155298992991447 Train Epoch: 1 [48/6000 (1%)] Loss: 26.744110107421875 Train Epoch: 1 [64/6000 (1%)] Loss: nan Train Epoch: 1 [80/6000 (1%)] Loss: nan

do you know why?

TontonTremblay commented 5 years ago

This is weird... The loss should be going down, not going up.

TontonTremblay commented 5 years ago

Which version of pytorch are you using?

qingchenkanlu commented 5 years ago

As required, torch==0.4.0

TontonTremblay commented 5 years ago

Can you try with the latest. You might have to change some code.

On Mon, Sep 9, 2019 at 17:11 qingchenkanlu notifications@github.com wrote:

As required, torch==0.4.0

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/NVlabs/Deep_Object_Pose/issues/74?email_source=notifications&email_token=ABK6JIHIDFMBW4R273WCAWLQI3Q3TA5CNFSM4IUSZR3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6JMZNQ#issuecomment-529714358, or mute the thread https://github.com/notifications/unsubscribe-auth/ABK6JIGFQIPXGISXKRA5CSDQI3Q3TANCNFSM4IUSZR3A .

qingchenkanlu commented 5 years ago

I use the latest code, it can converge with only one gpu, but not with two gpu

qingchenkanlu commented 5 years ago

Sorry to disturb you, I want to ask one more question, if I want to use the model to detect real object, I need to print the Obj image and paste up ?

TontonTremblay commented 5 years ago

The easiest way would be to buy the product in the store or order it on Amazon if you are in North America. Otherwise, printing it on a box of the exact same size as the original box is thee other way to go.

qingchenkanlu commented 5 years ago

OK,thanks, now I'am trying to make my own dataset.

qingchenkanlu commented 5 years ago

Excuse me, I want to know whether the background like the kitchen has been included in the NDDS?

TontonTremblay commented 5 years ago

No NDDS only contains the code and components needed to export annotated images. We did not include any 3d content with it.

qingchenkanlu commented 5 years ago

It's easy to make the background?

TontonTremblay commented 5 years ago

yeah the backgrounds are easy to add if you follow the tutorials.

qingchenkanlu commented 5 years ago

OK,Thank you!