Closed Paragjain10 closed 2 years ago
data_tmp[fg_coord] = \ np.reshape( prediction['affinities'], (np.prod(prediction['affinities'].shape), 1, 1) )
I tried a few things first, but this was the only thing that got the code working. Is this change correct?
looks good I think, if it runs and the pred_affs folder are getting created.
After this, the code successfully computed the decode step but got stuck showing a similar error with
exit code - 9
. I think it occurs while computing the vote instances:
As it is the same error code it is probably outofmemory again, did you check dmesg?
How much RAM do you have? And how big are your image again?
You tried a right thing, however there is unfortunately some inconsistency between the config file and run_ppp, in vote_instances num_workers
is not used anymore and but it is replaced by num_parallel_samples
, which is not set in the config and thus already has a default of 1
.
Yes @abred, the reason is the same. Os killing the processes. The dmesg is also similar:
(Parag_GreenAI) student2@BQ-DX1100-CT2:~/Desktop/Parag_masterthesis/PatchPerPix$ dmesg | egrep -i 'killed process'
[3944683.201497] Out of memory: Killed process 12089 (python) total-vm:26580804kB, anon-rss:25243156kB, file-rss:0kB, shmem-rss:4kB, UID:1003 pgtables:50116kB oom_score_adj:0
[3951431.125635] Out of memory: Killed process 14309 (python) total-vm:37894136kB, anon-rss:24871092kB, file-rss:73564kB, shmem-rss:10240kB, UID:1003 pgtables:50160kB oom_score_adj:0
[3952417.147848] Out of memory: Killed process 27522 (python) total-vm:37894132kB, anon-rss:24853440kB, file-rss:72924kB, shmem-rss:10240kB, UID:1003 pgtables:50128kB oom_score_adj:0
[3952670.222360] Out of memory: Killed process 30716 (python) total-vm:37894136kB, anon-rss:24847532kB, file-rss:74372kB, shmem-rss:10240kB, UID:1003 pgtables:50152kB oom_score_adj:0
[3952793.402220] Out of memory: Killed process 32425 (python) total-vm:37894136kB, anon-rss:24837228kB, file-rss:74248kB, shmem-rss:10240kB, UID:1003 pgtables:50128kB oom_score_adj:0
[3956437.478432] Out of memory: Killed process 19136 (python) total-vm:37894132kB, anon-rss:25219872kB, file-rss:73880kB, shmem-rss:10240kB, UID:1003 pgtables:50980kB oom_score_adj:0
The available RAM is of size 7981mb. Out of which 150-200mb is required by system processes. The resolution of my image is (1536,1536), the space required by the zarr files on the disc is 15 mb per sample which is 2.6 mb per sample for worm data. Also, when i was looking around the possible solutions to tackle the exit code -9 problem could be allocating only 6000mb from the total 7981 mb. This would prevent the overload and keep enough memory for system processes.
You don't by any chance have access to a system with more RAM? 8GB is not really a lot anymore :) That would be the easiest solution. Hm it would be interesting to find out where exactly the outofmemory error is happening, but the oom-killer is preventing that. Are you the only one using this system and do you have root access? Then you could try temporarily disabling the oom-killer, then you should get a "proper" Python outofmemory exception which is easier to debug.
Alternatively, for our 3d datasets (e.g. nuclei3d) we process each image in blocks, partly for the same reason. However the code is written with 3d data in mind, as we never had memory issues with 2d data. So there you would have to make a large number of changes (main file is this one https://github.com/Kainmueller-Lab/PatchPerPix/blob/master/PatchPerPix/vote_instances/stitch_patch_graph.py)
@abred
How much more RAM would be appropriate ? What are the specifications of the GPU that you used for 2d data training?
I am not sure about getting another system, I will have to talk to my supervisor about this.
There is a possibility of getting a system with 12GB RAM, could that work?
No, I do not have root access to the system , there are other people using the system as well. First I'll try having a conversation about this with my supervisor, if nothing comes out as a solution. Then I will think about how should I re-work the code.
Hello @abred,
Is it possible that reducing training and testing input images shape could be helpful, for e.g. :
[model]
train_input_shape = [ 128, 128,]
test_input_shape = [ 128, 128,]
Is it possible to train the network with this size of an input image or there are changes that will be required in the network? Will the network accept this input size.
Also, would like to know what are the specifications of the GPU used by you for the training.
Sorry, I am not sure how much RAM is necessary, that depends on the data. (note that the issue here is the system RAM not the GPU RAM (we used a rtx 2080TI))
During training the model is trained on random crops of size train_input_shape, the full image can be arbitrarily large. test_input_shape is only for prediction. The instance assembly, however, operates on the whole image (unless you use the block-wise processing)
What you can try is downscale your images in general and then work on those, this might result in a loss of accuracy though. (You have to train on the downscale images then, too)
Hello @abred, Thank you for your constant support.
As you suggested I downscaled the images and also implemented the method on a different system.
On the previous system I tried training the network with image size (512, 512) But it threw exitcode -9
at some point during vote_instances
. So, I tried with image size (256, 256) In this case after training during prediction this error was raised:
AssertionError: reference RAW with ROI [0:512, 0:512] (512, 512) does not fit into provided upstream [-94:350, -94:350] (444, 444)
I changed the model.test_input_shape = [ 512, 512 ,]
to this model.test_input_shape = [ 256, 256,]
. Now the code is
running and computing predictions. Is this change correct?
On the new system I started training for image size (512, 512). The training was done, while the prediction part was On after the prediction of one sample the code is freezing at this:
INFO:__main__:forking <function predict_sample at 0x7fd4786492f0>
INFO:__main__:predicting 05_56!
INFO:gunpowder.tensorflow.local_server:Server already running at b'grpc://localhost:36537'
INFO:gunpowder.tensorflow.nodes.predict:Initializing tf session, connecting to b'grpc://localhost:36537'...
WARNING:tensorflow:From /home/babrm/Desktop/Parag_GreenAi/patchperpix/gunpowder/tensorflow/nodes/predict.py:182: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
WARNING:tensorflow:From /home/babrm/Desktop/Parag_GreenAi/patchperpix/gunpowder/tensorflow/nodes/predict.py:182: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
2021-01-29 08:41:05.789154: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1 2021-01-29 08:41:05.846785: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545 pciBusID: 0000:3b:00.0 2021-01-29 08:41:05.848374: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 1 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545 pciBusID: 0000:5e:00.0 2021-01-29 08:41:05.849948: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 2 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545 pciBusID: 0000:af:00.0 2021-01-29 08:41:05.850320: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0 2021-01-29 08:41:05.853370: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0 2021-01-29 08:41:05.855013: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0 2021-01-29 08:41:05.855382: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0 2021-01-29 08:41:05.857249: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0 2021-01-29 08:41:05.858726: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0 2021-01-29 08:41:05.863367: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2021-01-29 08:41:05.868082: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0, 1, 2 WARNING:tensorflow:From /home/babrm/Desktop/Parag_GreenAi/patchperpix/gunpowder/tensorflow/nodes/predict.py:236: The name tf.train.import_meta_graph is deprecated. Please use tf.compat.v1.train.import_meta_graph instead.
INFO:gunpowder.tensorflow.nodes.predict:Reading graph from /home/babrm/Desktop/Parag_GreenAi/patchperpix/results/wormbodies_setup08_210128_181802/test/test_net.meta and weights from /home/babrm/Desktop/Parag_GreenAi/patchperpix/results/wormbodies_setup08_210128_181802/train/train_net_checkpoint_400000... WARNING:tensorflow:From /home/babrm/Desktop/Parag_GreenAi/patchperpix/gunpowder/tensorflow/nodes/predict.py:236: The name tf.train.import_meta_graph is deprecated. Please use tf.compat.v1.train.import_meta_graph instead.
INFO:tensorflow:Restoring parameters from /home/babrm/Desktop/Parag_GreenAi/patchperpix/results/wormbodies_setup08_210128_181802/train/train_net_checkpoint_400000 INFO:tensorflow:Restoring parameters from /home/babrm/Desktop/Parag_GreenAi/patchperpix/results/wormbodies_setup08_210128_181802/train/train_net_checkpoint_400000
There is no progress after that. I tried running it multiple times.
forking
so that seems ok. Could you please check that you are on the correct gunpowder commit? (a53bb55edc2d28a1acb310971571984cf0771cab
) The line ...Initializing tf session...
shouldn't be there I think.(sorry clicked the wrong button :) )
@abred Thank you for the reply.
In the first experiment, where I am training for image size (25,256) i trained the network for 700k iterations. I have come across this problem where I am cross validating at check point 700k. But there seems to be this problem :
['01_23', '02_56', '10_1134', '05_74', '10_1124', '07_45', '01_11', '03_461', '08_469', '05_60', '02_3', '05_39', '02_17', '03_492', '10_1138', '10_1090', '04_1013', '10_1100', '03_437', '02_6', '06_51', '04_946', '09_747', '05_85', '07_92', '01_84', '10_1060', '09_753', '08_412', '08_421', '03_458', '07_58', '06_24', '04_979', '03_507', '02_14', '10_1107', '03_452', '03_531', '06_40', '01_25', '10_1135', '02_86', '01_73', '09_748', '03_475', '05_62', '08_491', '04_1019', '03_455', '06_3', '02_94', '09_726', '02_36', '03_477', '02_22', '06_76', '05_33', '03_528', '03_466', '02_90', '06_17', '03_502', '01_42', '10_1069', '03_471', '08_497', '09_768', '05_11', '08_407', '07_81', '01_74', '08_484', '01_29', '06_19', '03_467', '04_967', '07_51', '04_1031', '09_777', '08_423', '05_79', '06_68', '10_1067', '01_62', '07_42', '02_85', '07_29', '02_100', '07_85', '04_1018', '02_82', '06_4', '04_955', '02_24', '03_499', '07_13', '02_97', '01_14', '09_728', '04_1001', '03_509', '06_21', '07_63', '05_50', '04_1007', '04_1012', '04_1004', '01_82', '06_46', '10_1147', '02_50', '07_64', '04_940', '07_23', '08_404', '08_418', '04_958', '02_98', '04_1037', '02_48', '04_1033', '03_470', '04_999', '01_43', '09_735', '01_46', '05_87', '06_36', '10_1140', '05_56', '07_77', '03_515', '01_49', '01_59', '06_33', '03_446', '07_36', '06_29', '03_485', '04_1030', '06_64', '01_86', '08_415', '06_90', '01_68', '01_39', '09_756', '04_948', '01_28', '02_75', '09_779', '10_1114', '03_496', '03_505', '03_474', '09_775', '02_20', '07_33', '06_58', '10_1142', '01_63', '01_81', '05_25', '10_1076', '02_29', '04_938', '10_1080', '08_451', '05_7', '04_1005', '04_951', '04_1026', '03_519', '09_793', '06_12', '02_47', '10_1102', '09_785', '08_461', '01_6', '01_88', '08_496', '04_1028', '10_1104', '10_1133', '08_459', '07_41', '04_1032', '07_12', '10_1071', '07_54', '01_15', '02_15', '09_732', '02_2', '04_1006', '07_68', '07_18', '10_1129']
['01_23', '02_56', '10_1134', '05_74', '10_1124', '07_45', '01_11', '03_461', '08_469', '05_60', '02_3', '05_39', '02_17', '03_492', '10_1138', '10_1090', '04_1013', '10_1100', '03_437', '02_6', '06_51', '04_946', '09_747', '05_85', '07_92', '01_84', '10_1060', '09_753', '08_412', '08_421', '03_458', '07_58', '06_24', '04_979', '03_507', '02_14', '10_1107', '03_452', '03_531', '06_40', '01_25', '10_1135', '02_86', '01_73', '09_748', '03_475', '05_62', '08_491', '04_1019', '03_455', '06_3', '02_94', '09_726', '02_36', '03_477', '02_22', '06_76', '05_33', '03_528', '03_466', '02_90', '06_17', '03_502', '01_42', '10_1069', '03_471', '08_497', '09_768', '05_11', '08_407', '07_81', '01_74', '08_484', '01_29', '06_19', '03_467', '04_967', '07_51', '04_1031', '09_777', '08_423', '05_79', '06_68', '10_1067', '01_62', '07_42', '02_85', '07_29', '02_100', '07_85', '04_1018', '02_82', '06_4', '04_955', '02_24', '03_499', '07_13', '02_97', '01_14', '09_728', '04_1001', '03_509', '06_21', '07_63', '05_50', '04_1007', '04_1012', '04_1004', '01_82', '06_46', '10_1147', '02_50', '07_64', '04_940', '07_23', '08_404', '08_418', '04_958', '02_98', '04_1037', '02_48', '04_1033', '03_470', '04_999', '01_43', '09_735', '01_46', '05_87', '06_36', '10_1140', '05_56', '07_77', '03_515', '01_49', '01_59', '06_33', '03_446', '07_36', '06_29', '03_485', '04_1030', '06_64', '01_86', '08_415', '06_90', '01_68', '01_39', '09_756', '04_948', '01_28', '02_75', '09_779', '10_1114', '03_496', '03_505', '03_474', '09_775', '02_20', '07_33', '06_58', '10_1142', '01_63', '01_81', '05_25', '10_1076', '02_29', '04_938', '10_1080', '08_451', '05_7', '04_1005', '04_951', '04_1026', '03_519', '09_793', '06_12', '02_47', '10_1102', '09_785', '08_461', '01_6', '01_88', '08_496', '04_1028', '10_1104', '10_1133', '08_459', '07_41', '04_1032', '07_12', '10_1071', '07_54', '01_15', '02_15', '09_732', '02_2', '04_1006', '07_68', '07_18', '10_1129']
{'01_6': 0.1411111111111111, '01_11': 0.14722222222222223, '01_14': 0.4333333333333333, '01_15': 0.19505494505494506, '01_25': 0.32166666666666666, '01_28': 0.225, '01_29': 0.19916666666666666, '01_39': 0.18, '01_42': 0.4916666666666666, '01_43': 0.175, '01_46': 0.045454545454545456, '01_49': 0.09395604395604396, '01_59': 0.03428571428571429, '01_62': 0.07846577227382182, '01_63': 0.035526315789473684, '01_68': 0.13015873015873014, '01_73': 0.0, '01_74': 0.2, '01_81': 0.23500000000000001, '01_82': 0.21444444444444444, '01_84': 0.30333333333333334, '01_86': 0.08333333333333333, '01_88': 0.25, '02_2': 0.6678571428571429, '02_3': 0.25555555555555554, '02_6': 0.292063492063492, '02_14': 0.4666666666666667, '02_15': 0.8, '02_17': 0.07291666666666666, '02_20': 0.2342929292929293, '02_22': 0.10500000000000001, '02_24': 0.04285714285714286, '02_29': 0.0, '02_36': 0.15535714285714283, '02_47': 0.06666666666666667, '02_48': 0.0, '02_50': 0.0, '02_56': 0.42777777777777776, '02_75': 0.2816666666666666, '02_82': 0.13666666666666666, '02_85': 0.053968253968253964, '02_86': 0.01, '02_90': 0.1, '02_94': 0.05934065934065934, '02_97': 0.04017857142857143, '02_98': 0.10287581699346404, '02_100': 0.05934065934065934, '03_437': 0.2760239760239761, '03_446': 0.30436507936507934, '03_452': 0.225, '03_455': 0.18727272727272726, '03_458': 0.10275197798417612, '03_461': 0.22432288299935355, '03_466': 0.1718031968031968, '03_467': 0.07857142857142856, '03_470': 0.14833333333333332, '03_471': 0.13727272727272727, '03_474': 0.23166666666666663, '03_475': 0.12857142857142856, '03_477': 0.05, '03_485': 0.48323232323232324, '03_492': 0.2875, '03_496': 0.30095238095238097, '03_499': 0.5900000000000001, '03_502': 0.24441558441558442, '03_505': 0.2643589743589744, '03_507': 0.36317460317460315, '03_509': 0.36821289821289827, '03_515': 0.4515151515151515, '03_519': 0.37523809523809526, '03_528': 0.3254700854700855, '03_531': 0.29103785103785107, '04_938': 0.17572074983839692, '04_940': 0.24043839330604033, '04_946': 0.17589807852965747, '04_948': 0.1353973168214654, '04_951': 0.21678904428904427, '04_955': 0.17207417582417583, '04_958': 0.15402391725921138, '04_967': 0.3315584415584416, '04_979': 0.15256077256077255, '04_999': 0.12333333333333334, '04_1001': 0.06153846153846154, '04_1004': 0.10060606060606062, '04_1005': 0.13293650793650794, '04_1006': 0.01818181818181818, '04_1007': 0.007142857142857143, '04_1012': 0.14821428571428572, '04_1013': 0.14373065015479874, '04_1018': 0.1808467246547742, '04_1019': 0.14805860805860807, '04_1026': 0.2618648018648019, '04_1028': 0.10108359133126936, '04_1030': 0.14163170163170163, '04_1031': 0.0771978021978022, '04_1032': 0.19213286713286715, '04_1033': 0.07922077922077922, '04_1037': 0.00909090909090909, '05_7': 0.0890909090909091, '05_11': 0.12896825396825398, '05_25': 0.19947552447552447, '05_33': 0.059583333333333335, '05_39': 0.038699690402476776, '05_50': 0.06666666666666667, '05_56': 0.11787878787878787, '05_60': 0.08205128205128205, '05_62': 0.19047619047619047, '05_74': 0.14421703296703298, '05_79': 0.016666666666666666, '05_85': 0.048571428571428564, '05_87': 0.04285714285714286, '06_3': 0.1026487788097695, '06_4': 0.027485380116959064, '06_12': 0.3196969696969697, '06_17': 0.04162581699346406, '06_19': 0.025313283208020048, '06_21': 0.09579248366013073, '06_24': 0.005555555555555555, '06_29': 0.005263157894736842, '06_33': 0.12879120879120878, '06_36': 0.06125541125541125, '06_40': 0.2224242424242424, '06_46': 0.1083916083916084, '06_51': 0.14636363636363633, '06_58': 0.12777777777777777, '06_64': 0.17266806722689076, '06_68': 0.12186274509803922, '06_76': 0.16291531997414352, '06_90': 0.19101190476190474, '07_12': 0.34979864820422096, '07_13': 0.22563193226808784, '07_18': 0.17745098039215687, '07_23': 0.16953296703296705, '07_29': 0.29525474525474527, '07_33': 0.20757575757575758, '07_36': 0.225, '07_41': 0.13833333333333334, '07_42': 0.20171717171717174, '07_45': 0.08007395940832474, '07_51': 0.13761446886446888, '07_54': 0.1171671826625387, '07_58': 0.05263157894736842, '07_63': 0.12818181818181817, '07_64': 0.16638888888888886, '07_68': 0.1527777777777778, '07_77': 0.1509090909090909, '07_81': 0.16940359477124184, '07_85': 0.21112637362637363, '07_92': 0.013333333333333332, '08_404': 0.02222222222222222, '08_407': 0.1223661921602425, '08_412': 0.16533496732026146, '08_415': 0.07263157894736842, '08_418': 0.12030075187969924, '08_421': 0.22626696832579185, '08_423': 0.2731203007518797, '08_451': 0.0058823529411764705, '08_459': 0.00625, '08_461': 0.0963690476190476, '08_469': 0.30953907203907205, '08_484': 0.09772727272727273, '08_491': 0.0, '08_496': 0.29393939393939394, '08_497': 0.31437229437229436, '09_726': 0.1880250257997936, '09_728': 0.03333333333333333, '09_732': 0.08719298245614035, '09_735': 0.10487637362637363, '09_747': 0.21323412698412697, '09_748': 0.17013243894048846, '09_753': 0.05, '09_756': 0.014285714285714285, '09_768': 0.2925141525141525, '09_775': 0.06153846153846154, '09_777': 0.05, '09_779': 0.36750000000000005, '09_785': 0.3990909090909091, '09_793': 0.22764877880976955, '10_1060': 0.114025974025974, '10_1067': 0.2155050505050505, '10_1069': 0.14615384615384613, '10_1071': 0.19777777777777777, '10_1076': 0.17555555555555555, '10_1080': 0.4066666666666666, '10_1090': 0.12758857929136563, '10_1100': 0.1764867485455721, '10_1102': 0.2325124875124875, '10_1104': 0.18916157372039727, '10_1107': 0.38298701298701304, '10_1114': 0.258409825468649, '10_1124': 0.26328197945845006, '10_1129': 0.34016806722689075, '10_1133': 0.2568650793650794, '10_1134': 0.3880519480519481, '10_1135': 0.2892307692307693, '10_1138': 0.37096153846153845, '10_1140': 0.18700387331966278, '10_1142': 0.1847100725361595, '10_1147': 0.19282894736842104} ['01_6', '01_11', '01_14', '01_15', '01_23', '01_25', '01_28', '01_29', '01_39', '01_42', '01_43', '01_46', '01_49', '01_59', '01_62', '01_63', '01_68', '01_73', '01_74', '01_81', '01_82', '01_84', '01_86', '01_88', '02_2', '02_3', '02_6', '02_14', '02_15', '02_17', '02_20', '02_22', '02_24', '02_29', '02_36', '02_47', '02_48', '02_50', '02_56', '02_75', '02_82', '02_85', '02_86', '02_90', '02_94', '02_97', '02_98', '02_100', '03_437', '03_446', '03_452', '03_455', '03_458', '03_461', '03_466', '03_467', '03_470', '03_471', '03_474', '03_475', '03_477', '03_485', '03_492', '03_496', '03_499', '03_502', '03_505', '03_507', '03_509', '03_515', '03_519', '03_528', '03_531', '04_938', '04_940', '04_946', '04_948', '04_951', '04_955', '04_958', '04_967', '04_979', '04_999', '04_1001', '04_1004', '04_1005', '04_1006', '04_1007', '04_1012', '04_1013', '04_1018', '04_1019', '04_1026', '04_1028', '04_1030', '04_1031', '04_1032', '04_1033', '04_1037', '05_7', '05_11', '05_25', '05_33', '05_39', '05_50', '05_56', '05_60', '05_62', '05_74', '05_79', '05_85', '05_87', '06_3', '06_4', '06_12', '06_17', '06_19', '06_21', '06_24', '06_29', '06_33', '06_36', '06_40', '06_46', '06_51', '06_58', '06_64', '06_68', '06_76', '06_90', '07_12', '07_13', '07_18', '07_23', '07_29', '07_33', '07_36', '07_41', '07_42', '07_45', '07_51', '07_54', '07_58', '07_63', '07_64', '07_68', '07_77', '07_81', '07_85', '07_92', '08_404', '08_407', '08_412', '08_415', '08_418', '08_421', '08_423', '08_451', '08_459', '08_461', '08_469', '08_484', '08_491', '08_496', '08_497', '09_726', '09_728', '09_732', '09_735', '09_747', '09_748', '09_753', '09_756', '09_768', '09_775', '09_777', '09_779', '09_785', '09_793', '10_1060', '10_1067', '10_1069', '10_1071', '10_1076', '10_1080', '10_1090', '10_1100', '10_1102', '10_1104', '10_1107', '10_1114', '10_1124', '10_1129', '10_1133', '10_1134', '10_1135', '10_1138', '10_1140', '10_1142', '10_1147']
199 200
Traceback (most recent call last):
File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 1623, in <module>
main()
File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 1619, in main
cross_validate(args, config, config['data']['val_data'], train_folder, val_folder)
File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 110, in wrapper
ret = func(*args, **kwargs)
File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 1277, in cross_validate
assert len(v[0]) == len(samples)
AssertionError
The length of v[0] is 199 and the length of samples is 200. In v[0] we have a tuple of the results.items
, I tried to figure out what the problem is but failed to do so. Do you know what could be the reason.
These assertions are sanity checks that everything went ok beforehand. Assuming you do have 200 samples, this indicates that some sample failed along the way.
You could try inspecting the difference between v[0].keys()
and samples
. When you find the sample that is not in v[0]
, the easiest option is to delete all files related to that sample in evaluated/instanced/processed and recompute them. That should fix it hopefully.
@abred I have trained the model with image size (256, 256) for 700k iterations.
Config file:
base = "/home/student2/Desktop/Parag_masterthesis/experimentresults/wormbodies_setup08_210128_152543"
[general]
logging = 20
debug = false
overwrite = false
[data]
train_data = "/home/student2/Desktop/Parag_masterthesis/traintest_data/train"
val_data = "/home/student2/Desktop/Parag_masterthesis/traintest_data/test"
test_data = "/home/student2/Desktop/Parag_masterthesis/traintest_data/test"
voxel_size = [ 1, 1,]
input_format = "zarr"
raw_key = "volumes/raw_bf"
gt_key = "volumes/gt_instances"
one_instance_per_channel_gt = "volumes/gt_labels"
num_channels = 1
validate_on_train = false
[model]
train_net_name = "train_net"
test_net_name = "test_net"
train_input_shape = [ 256, 256,]
test_input_shape = [ 256, 256,]
patchshape = [ 1, 41, 41,]
patchstride = [ 1, 1, 1,]
num_fmaps = 20
max_num_inst = 2
fmap_inc_factors = [ 2, 2, 2, 2,]
fmap_dec_factors = [ 1, 1, 1, 1,]
downsample_factors = [ [ 2, 2,], [ 2, 2,], [ 2, 2,], [ 2, 2,],]
activation = "relu"
padding = "valid"
kernel_size = 3
num_repetitions = 2
upsampling = "resize_conv"
overlapping_inst = false
code_units = 252
autoencoder_chkpt = "this"
[optimizer]
optimizer = "Adam"
lr = 5e-5
[preprocessing]
clipmax = 1500
[training]
batch_size = 1
num_gpus = 1
num_workers = 10
cache_size = 40
max_iterations = 700100
checkpoints = 50000
snapshots = 2000
profiling = 500
train_code = true
[prediction]
output_format = "zarr"
aff_key = "volumes/pred_affs"
code_key = "volumes/pred_code"
fg_key = "volumes/pred_numinst"
fg_thresh = 0.5
decode_batch_size = 1024
[validation]
params = [ "patch_threshold", "fc_threshold",]
patch_threshold = [ 0.5, 0.6, 0.7,]
fc_threshold = [ 0.5, 0.6, 0.7,]
[cross_validate]
checkpoints = [ 500000, 550000, 600000, 650000, 700000,]
patch_threshold = [ 0.5, 0.6, 0.7,]
fc_threshold = [ 0.5, 0.6, 0.7,]
[testing]
num_workers = 5
[vote_instances]
patch_threshold = 0.9
fc_threshold = 0.5
cuda = true
blockwise = false
num_workers = 8
chunksize = [ 92, 92, 92,]
select_patches_for_sparse_data = true
save_no_intermediates = true
output_format = "hdf"
parallel = false
includeSinglePatchCCS = false
sample = 1.0
removeIntersection = true
mws = true
isbiHack = false
mask_fg_border = false
graphToInst = false
skipLookup = false
skipConsensus = false
skipRanking = false
skipThinCover = false
affinity_graph_voting = false
affinity_graph_voting_selected = false
termAfterThinCover = false
fg_thresh_vi = -0.1
consensus_interleaved_cnt = false
consensus_norm_prob_product = true
consensus_prob_product = true
consensus_norm_aff = true
vi_bg_use_inv_th = false
vi_bg_use_half_th = true
vi_bg_use_less_than_th = false
rank_norm_patch_score = true
rank_int_counter = false
patch_graph_norm_aff = true
blockwise_old_stitch_fn = false
only_bb = false
flip_cons_arr_axes = false
return_intermediates = false
[evaluation]
num_workers = 1
res_key = "vote_instances"
metric = "confusion_matrix.avAP"
print_f_factor_perc_gt_0_8 = false
use_linear_sum_assignment = false
foreground_only = false
[postprocessing]
remove_small_comps = 600
[visualize]
samples_to_visualize = [ "01_23", "02_56",]
show_patches = true
[autoencoder]
overlapping_inst = true
code_method = "conv1x1_b"
train_net_name = "train_net"
test_net_name = "test_net"
train_input_shape = [ 1, 41, 41,]
test_input_shape = [ 1, 41, 41,]
patchshape = [ 1, 41, 41,]
patchstride = [ 1, 1, 1,]
network_type = "conv"
activation = "relu"
code_activation = "sigmoid"
encoder_units = [ 500, 1000,]
decoder_units = [ 1000, 500,]
num_fmaps = [ 32, 48, 64,]
downsample_factors = [ [ 2, 2,], [ 2, 2,], [ 2, 2,],]
upsampling = "resize_conv"
kernel_size = 3
num_repetitions = 2
padding = "same"
code_units = 252
regularizer = "l2"
regularizer_weight = 0.0001
loss_fn = "mse"
[training.sampling]
min_masked = 0.002
min_masked_overlap = 0.002
overlap_min_dist = 0
overlap_max_dist = 15
probability_overlap = 0.5
probability_fg = 0.5
[postprocessing.watershed]
output_format = "hdf"
[training.augmentation.elastic]
control_point_spacing = [ 40, 40,]
jitter_sigma = [ 2, 2,]
rotation_min = 0
rotation_max = 90
subsample = 2
[training.augmentation.intensity]
scale = [ 0.9, 1.1,]
shift = [ -0.1, 0.1,]
[training.augmentation.simple]
The results are :
INFO:__main__:confusion_matrix.avAP CROSS: 0.2227 [0.2308 ((650000, 0.6, 0.6)), 0.2146 ((650000, 0.6, 0.6))]
confusion_matrix.avAP CROSS: 0.2227 [0.2308 ((650000, 0.6, 0.6)), 0.2146 ((650000, 0.6, 0.6))]
confusion_matrix.avAP: 0.2227
confusion_matrix.th_0_5.AP: 0.4341
confusion_matrix.th_0_6.AP: 0.3346
confusion_matrix.th_0_7.AP: 0.2460
confusion_matrix.th_0_8.AP: 0.1571
confusion_matrix.th_0_9.AP: 0.0576
INFO:__main__:time cross_validate: 4:17:56.810282
What do you think about the results? Do you think optimizing would get the results comparatively high or The image size is playing a big role in the results being this low?
The numbers are indeed quite low, but it is hard to analyze from afar. You can look at the results qualitatively. Does the fg-bg segmentation look ok? In visualize there are scripts to look at the patches, how do they look? Have you had a look at the downsized images, could you still segment them properly by hand?
An unrelated comment, I would recommend splitting the data into train/val/test and then use validate/validate_checkpoints instead of cross_validate. We just did that for the wormbodies data because of prior work.
The numbers are indeed quite low, but it is hard to analyze from afar. You can look at the results qualitatively. Does the fg-bg segmentation look ok? In visualize there are scripts to look at the patches, how do they look? Have you had a look at the downsized images, could you still segment them properly by hand?
An unrelated comment, I would recommend splitting the data into train/val/test and then use validate/validate_checkpoints instead of cross_validate. We just did that for the wormbodies data because of prior work.
-do visualize
where in the config.toml
I give these samples:
[visualize]
samples_to_visualize = [ "01_23", "02_56",]
show_patches = true
But i get no visualization in the output :
/home/student2/anaconda3/envs/Parag_GreenAI/bin/python /home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py --setup setup08 --config /home/student2/Desktop/Parag_masterthesis/experimentresults/wormbodies_setup08_210128_152543/config.toml --do visualize --app wormbodies -id /home/student2/Desktop/Parag_masterthesis/experimentresults/wormbodies_setup08_210128_152543
INFO:__main__:attention: using config file ['/home/student2/Desktop/Parag_masterthesis/experimentresults/wormbodies_setup08_210128_152543/config.toml']
INFO:__main__:CUDA_VISIBILE_DEVICES already set, device 0
INFO:__main__:used config: {'base': '/home/student2/Desktop/Parag_masterthesis/experimentresults/wormbodies_setup08_210128_152543', 'general': {'logging': 20, 'debug': False, 'overwrite': False}, 'data': {'train_data': '/home/student2/Desktop/Parag_masterthesis/traintest_data/train', 'val_data': '/home/student2/Desktop/Parag_masterthesis/traintest_data/test', 'test_data': '/home/student2/Desktop/Parag_masterthesis/traintest_data/test', 'voxel_size': [1, 1], 'input_format': 'zarr', 'raw_key': 'volumes/raw_bf', 'gt_key': 'volumes/gt_instances', 'one_instance_per_channel_gt': 'volumes/gt_labels', 'num_channels': 1, 'validate_on_train': False}, 'model': {'train_net_name': 'train_net', 'test_net_name': 'test_net', 'train_input_shape': [256, 256], 'test_input_shape': [256, 256], 'patchshape': [1, 41, 41], 'patchstride': [1, 1, 1], 'num_fmaps': 20, 'max_num_inst': 2, 'fmap_inc_factors': [2, 2, 2, 2], 'fmap_dec_factors': [1, 1, 1, 1], 'downsample_factors': [[2, 2], [2, 2], [2, 2], [2, 2]], 'activation': 'relu', 'padding': 'valid', 'kernel_size': 3, 'num_repetitions': 2, 'upsampling': 'resize_conv', 'overlapping_inst': False, 'code_units': 252, 'autoencoder_chkpt': 'this'}, 'optimizer': {'optimizer': 'Adam', 'lr': 5e-05}, 'preprocessing': {'clipmax': 1500}, 'training': {'batch_size': 1, 'num_gpus': 1, 'num_workers': 10, 'cache_size': 40, 'max_iterations': 700100, 'checkpoints': 50000, 'snapshots': 2000, 'profiling': 500, 'train_code': True, 'sampling': {'min_masked': 0.002, 'min_masked_overlap': 0.002, 'overlap_min_dist': 0, 'overlap_max_dist': 15, 'probability_overlap': 0.5, 'probability_fg': 0.5}, 'augmentation': {'elastic': {'control_point_spacing': [40, 40], 'jitter_sigma': [2, 2], 'rotation_min': 0, 'rotation_max': 90, 'subsample': 2}, 'intensity': {'scale': [0.9, 1.1], 'shift': [-0.1, 0.1]}, 'simple': {}}}, 'prediction': {'output_format': 'zarr', 'aff_key': 'volumes/pred_affs', 'code_key': 'volumes/pred_code', 'fg_key': 'volumes/pred_numinst', 'fg_thresh': 0.5, 'decode_batch_size': 1024}, 'validation': {'params': ['patch_threshold', 'fc_threshold'], 'patch_threshold': [0.5, 0.6, 0.7], 'fc_threshold': [0.5, 0.6, 0.7]}, 'cross_validate': {'checkpoints': [500000, 550000, 600000, 650000, 700000], 'patch_threshold': [0.5, 0.6, 0.7], 'fc_threshold': [0.5, 0.6, 0.7]}, 'testing': {'num_workers': 5}, 'vote_instances': {'patch_threshold': 0.9, 'fc_threshold': 0.5, 'cuda': True, 'blockwise': False, 'num_workers': 8, 'chunksize': [92, 92, 92], 'select_patches_for_sparse_data': True, 'save_no_intermediates': True, 'output_format': 'hdf', 'parallel': False, 'includeSinglePatchCCS': False, 'sample': 1.0, 'removeIntersection': True, 'mws': True, 'isbiHack': False, 'mask_fg_border': False, 'graphToInst': False, 'skipLookup': False, 'skipConsensus': False, 'skipRanking': False, 'skipThinCover': False, 'affinity_graph_voting': False, 'affinity_graph_voting_selected': False, 'termAfterThinCover': False, 'fg_thresh_vi': -0.1, 'consensus_interleaved_cnt': False, 'consensus_norm_prob_product': True, 'consensus_prob_product': True, 'consensus_norm_aff': True, 'vi_bg_use_inv_th': False, 'vi_bg_use_half_th': True, 'vi_bg_use_less_than_th': False, 'rank_norm_patch_score': True, 'rank_int_counter': False, 'patch_graph_norm_aff': True, 'blockwise_old_stitch_fn': False, 'only_bb': False, 'flip_cons_arr_axes': False, 'return_intermediates': False}, 'evaluation': {'num_workers': 1, 'res_key': 'vote_instances', 'metric': 'confusion_matrix.avAP', 'print_f_factor_perc_gt_0_8': False, 'use_linear_sum_assignment': False, 'foreground_only': False}, 'postprocessing': {'remove_small_comps': 600, 'watershed': {'output_format': 'hdf'}}, 'visualize': {'samples_to_visualize': ['01_23', '02_56'], 'show_patches': True}, 'autoencoder': {'overlapping_inst': True, 'code_method': 'conv1x1_b', 'train_net_name': 'train_net', 'test_net_name': 'test_net', 'train_input_shape': [1, 41, 41], 'test_input_shape': [1, 41, 41], 'patchshape': [1, 41, 41], 'patchstride': [1, 1, 1], 'network_type': 'conv', 'activation': 'relu', 'code_activation': 'sigmoid', 'encoder_units': [500, 1000], 'decoder_units': [1000, 500], 'num_fmaps': [32, 48, 64], 'downsample_factors': [[2, 2], [2, 2], [2, 2]], 'upsampling': 'resize_conv', 'kernel_size': 3, 'num_repetitions': 2, 'padding': 'same', 'code_units': 252, 'regularizer': 'l2', 'regularizer_weight': 0.0001, 'loss_fn': 'mse'}}
INFO:__main__:reading data from /home/student2/Desktop/Parag_masterthesis/experimentresults/wormbodies_setup08_210128_152543/val/processed/700000
['01_23', '02_56', '10_1134', '05_74', '10_1124', '07_45', '01_11', '03_461', '08_469', '05_60', '02_3', '05_39', '02_17', '03_492', '10_1138', '10_1090', '04_1013', '10_1100', '03_437', '02_6', '06_51', '04_946', '09_747', '05_85', '07_92', '01_84', '10_1060', '09_753', '08_412', '08_421', '03_458', '07_58', '06_24', '04_979', '03_507', '02_14', '10_1107', '03_452', '03_531', '06_40', '01_25', '10_1135', '02_86', '01_73', '09_748', '03_475', '05_62', '08_491', '04_1019', '03_455', '06_3', '02_94', '09_726', '02_36', '03_477', '02_22', '06_76', '05_33', '03_528', '03_466', '02_90', '06_17', '03_502', '01_42', '10_1069', '03_471', '08_497', '09_768', '05_11', '08_407', '07_81', '01_74', '08_484', '01_29', '06_19', '03_467', '04_967', '07_51', '04_1031', '09_777', '08_423', '05_79', '06_68', '10_1067', '01_62', '07_42', '02_85', '07_29', '02_100', '07_85', '04_1018', '02_82', '06_4', '04_955', '02_24', '03_499', '07_13', '02_97', '01_14', '09_728', '04_1001', '03_509', '06_21', '07_63', '05_50', '04_1007', '04_1012', '04_1004', '01_82', '06_46', '10_1147', '02_50', '07_64', '04_940', '07_23', '08_404', '08_418', '04_958', '02_98', '04_1037', '02_48', '04_1033', '03_470', '04_999', '01_43', '09_735', '01_46', '05_87', '06_36', '10_1140', '05_56', '07_77', '03_515', '01_49', '01_59', '06_33', '03_446', '07_36', '06_29', '03_485', '04_1030', '06_64', '01_86', '08_415', '06_90', '01_68', '01_39', '09_756', '04_948', '01_28', '02_75', '09_779', '10_1114', '03_496', '03_505', '03_474', '09_775', '02_20', '07_33', '06_58', '10_1142', '01_63', '01_81', '05_25', '10_1076', '02_29', '04_938', '10_1080', '08_451', '05_7', '04_1005', '04_951', '04_1026', '03_519', '09_793', '06_12', '02_47', '10_1102', '09_785', '08_461', '01_6', '01_88', '08_496', '04_1028', '10_1104', '10_1133', '08_459', '07_41', '04_1032', '07_12', '10_1071', '07_54', '01_15', '02_15', '09_732', '02_2', '04_1006', '07_68', '07_18', '10_1129']
['01_23', '02_56', '10_1134', '05_74', '10_1124', '07_45', '01_11', '03_461', '08_469', '05_60', '02_3', '05_39', '02_17', '03_492', '10_1138', '10_1090', '04_1013', '10_1100', '03_437', '02_6', '06_51', '04_946', '09_747', '05_85', '07_92', '01_84', '10_1060', '09_753', '08_412', '08_421', '03_458', '07_58', '06_24', '04_979', '03_507', '02_14', '10_1107', '03_452', '03_531', '06_40', '01_25', '10_1135', '02_86', '01_73', '09_748', '03_475', '05_62', '08_491', '04_1019', '03_455', '06_3', '02_94', '09_726', '02_36', '03_477', '02_22', '06_76', '05_33', '03_528', '03_466', '02_90', '06_17', '03_502', '01_42', '10_1069', '03_471', '08_497', '09_768', '05_11', '08_407', '07_81', '01_74', '08_484', '01_29', '06_19', '03_467', '04_967', '07_51', '04_1031', '09_777', '08_423', '05_79', '06_68', '10_1067', '01_62', '07_42', '02_85', '07_29', '02_100', '07_85', '04_1018', '02_82', '06_4', '04_955', '02_24', '03_499', '07_13', '02_97', '01_14', '09_728', '04_1001', '03_509', '06_21', '07_63', '05_50', '04_1007', '04_1012', '04_1004', '01_82', '06_46', '10_1147', '02_50', '07_64', '04_940', '07_23', '08_404', '08_418', '04_958', '02_98', '04_1037', '02_48', '04_1033', '03_470', '04_999', '01_43', '09_735', '01_46', '05_87', '06_36', '10_1140', '05_56', '07_77', '03_515', '01_49', '01_59', '06_33', '03_446', '07_36', '06_29', '03_485', '04_1030', '06_64', '01_86', '08_415', '06_90', '01_68', '01_39', '09_756', '04_948', '01_28', '02_75', '09_779', '10_1114', '03_496', '03_505', '03_474', '09_775', '02_20', '07_33', '06_58', '10_1142', '01_63', '01_81', '05_25', '10_1076', '02_29', '04_938', '10_1080', '08_451', '05_7', '04_1005', '04_951', '04_1026', '03_519', '09_793', '06_12', '02_47', '10_1102', '09_785', '08_461', '01_6', '01_88', '08_496', '04_1028', '10_1104', '10_1133', '08_459', '07_41', '04_1032', '07_12', '10_1071', '07_54', '01_15', '02_15', '09_732', '02_2', '04_1006', '07_68', '07_18', '10_1129']
Process finished with exit code 0
I am downscaling the images in consolidate.py
itself, so haven't seen the visualization explicitly for patchperpix but I have done it in the other experiments that I have performed for another approach. Will do it in this one as well.
About splitting the data in train/Val/test manner, my next step was going to be this as well. So, yes I will be running the experiments again with this combination next.
@abred
I just checked an HDF file is getting created of the samples I am trying to visualize at this location:
/home/student2/Desktop/Parag_masterthesis/experimentresults/wormbodies_setup08_210128_152543/val/processed/70000
Is it correct? Or am I missing out on something?
@abred
I just checked an HDF file is getting created of the samples I am trying to visualize at this location:
/home/student2/Desktop/Parag_masterthesis/experimentresults/wormbodies_setup08_210128_152543/val/processed/70000
Is it correct? Or am I missing out on something?
yes, that sounds good, you can look at it with the hdf viewer or fiji/imagej.
In addition I would also look at the foreground prediction (pred_fgbg
in the zarr file). Fiji can also visualize zarr files (File->Import->N5)
@abred The visualization of the hdf file of the sample 01_23 looks like this: Also, do I have to view the HDF files with certain specific parameters in Fiji or this look fine?
Unfortunately, I am not being to locate in which folder do I find pred_fgbg
Zarr file. The three folders I could locate were pred_affs
, pred_numist
, pred_code
. Could you tell me where do I find the pred_fgbg files?
sorry pred_numinst
should be the correct one as in your config file this is set fg_key = "volumes/pred_numinst"
.
The visualization of the hdf file has a very high resolution, if you haven't done it already, you have to zoom into some regions to have a closer look. In the end it is just a grayscale image with values between 0 and 1, default parameters should be fine (maybe brightness/contrast a bit (image->adjust)
pred_numist
Zarr file. I cannot directly see the N5 option using (File->Import->N5).
I had to use (Plugins->BigVataViewer->N5 viewer) instead. This error prompt displayed:I think that's a different plugin. You need this one https://github.com/saalfeldlab/n5-ij . Should be available through the fiji plugin system (Help->Update)
@abred I ran the experiment with image size 512 for 100k iterations. Also, I am using different sets for train, test and Val, performing validation/validate_checkpoints instead of cross_validate.
I have got good results just by training for 100k iterations as well:
INFO:__main__:confusion_matrix.avAP TEST checkpoint 100000: 0.2211 ({'patch_threshold': 0.9, 'fc_threshold': 0.5})
Is it possible to optimize the network for 100k iterations and then finally training the optimized network for a maximum number of iterations. Because optimizing the network for 700k iterations or even more for bigger datasets is very time-consuming. Do you think this way of optimizing the network is possible with PatchPerpix?
Also, the parameters that I was interested in were code size and patch size. In the paper, the best results for various datasets are shown for various code sizes and patch sizes. In my case, the patch size is 41*41, and the code size is 252. Can you tell me what parameters should I take into consideration for the optimization of the network for my dataset?
@abred
Also, I want the AP values in coco format. So, I have changed the formula for AP as :
from ap = 1.*(apTP) / max(1, apTP + apFN + apFP)
to this ap = 1. * (apTP) / max(1, apTP + apFP)
in evaluate-instance-segmentation/evaluate.py
Do I need to make any more changes in the code?
Hello @abred,
I am running an experiment for a downscaled image size of 512 as I mentioned above. But I am training it for more number of iterations 700k iterations. I had a doubt regarding the time it is taking for training and evaluation it has been two days and it's on the evaluation step where it's evaluating on every checkpoint= 50k. So the overall computation time is really high right now. It is still computing for checkpoint 400k. Can I evaluate for only specific checkpoints or some other way where the evaluation time is reduced?
@abred Also, I want the AP values in coco format. So, I have changed the formula for AP as : from
ap = 1.*(apTP) / max(1, apTP + apFN + apFP)
to thisap = 1. * (apTP) / max(1, apTP + apFP)
inevaluate-instance-segmentation/evaluate.py
Do I need to make any more changes in the code?
That should be fine.
Hello @abred,
I am running an experiment for a downscaled image size of 512 as I mentioned above. But I am training it for more number of iterations 700k iterations. I had a doubt regarding the time it is taking for training and evaluation it has been two days and it's on the evaluation step where it's evaluating on every checkpoint= 50k. So the overall computation time is really high right now. It is still computing for checkpoint 400k. Can I evaluate for only specific checkpoints or some other way where the evaluation time is reduced?
you can just add a line like for [cross_validate]
to [validation]
checkpoints = [ 500000, 550000, 600000, 650000, 700000,]
otherwise it will check all available checkpoints.
@abred I ran the experiment with image size 512 for 100k iterations. Also, I am using different sets for train, test and Val, performing validation/validate_checkpoints instead of cross_validate.
I have got good results just by training for 100k iterations as well:
INFO:__main__:confusion_matrix.avAP TEST checkpoint 100000: 0.2211 ({'patch_threshold': 0.9, 'fc_threshold': 0.5})
Is it possible to optimize the network for 100k iterations and then finally training the optimized network for a maximum number of iterations. Because optimizing the network for 700k iterations or even more for bigger datasets is very time-consuming. Do you think this way of optimizing the network is possible with PatchPerpix?
Also, the parameters that I was interested in were code size and patch size. In the paper, the best results for various datasets are shown for various code sizes and patch sizes. In my case, the patch size is 41*41, and the code size is 252. Can you tell me what parameters should I take into consideration for the optimization of the network for my dataset?
Might work, to get an idea for good parameters it might be enough. I would recommend to e.g. look at the loss in tensorboard and check that it is decreasing nicely (That's important generally). And maybe evaluate on a few training and a few validation images to check that it doesn't overfit too much. The network also writes snapshots during training from time to time, you can look at those to check that it's working sensibly (you probably have to decode them first, similarly to what you do with the validation/test data) The necessary patch size depends strongly on your data, larger sizes might be required for larger overlaps. If that doesn't occur in your data you can keep it smaller. The code size is somewhat relative to the patch size. Maybe you need a different learning rate for your data. Or a deeper or shallower U-Net.
Thank you for all the answers @abred
The training loss looks fine according to me : To check if the model overfits is Val loss being computed? If we could log Val loss then we could easily figure out if the model overfits. I am not sure what evaluating on few training and validation means?
This is how the pred_fgbg of a snapshot at 690k looks:
Here is a sample of the dataset, where there is occlusion/overlap on the top left and bottom right of the image. Would you call this a large overlap? The overlaps are in a similar manner throughout the data.The patch size currently is 41*41 and code _size is 252. Is it appropriate for this kind of data?
Thank you for all the answers @abred
1. The training loss looks fine according to me : To check if the model overfits is Val loss being computed? If we could log Val loss then we could easily figure out if the model overfits. I am not sure what evaluating on few training and validation means?
Val loss is currently not supported by the training framework that is used (gunpowder). That's why I recommended, just to get a rough idea of the performance, to evaluate a few training image and a few validation images and compare them quantitatively and qualitatively.
3. Here is a sample of the dataset, where there is occlusion/overlap on the top left and bottom right of the image. Would you call this a large overlap? The overlaps are in a similar manner throughout the data.The patch size currently is 41*41 and code _size is 252. Is it appropriate for this kind of data?
Looking at your data, I am not sure you can compare your numbers to ours, you only have very few larger instances. How does the final instance segmentation look like?
Regarding the sizes, if you have an instance that is complete split into two parts by an overlapping second instance, it is only possible to merge the two parts again into one instance in the final instance segmentation if the overlap at the narrowest location is smaller than patch_size/2
Okay, I understood the first part.
In the second part by final instance segmentation do you mean the ground truth of the same image or segmentation prediction by the network?
About the patch_size
and code_size
, what are the maximum values that these two could be set to?
And by analyzing my dataset and trying to find out what the maximum size of the narrowest overlap could be where the one instance is completely split into two, would be the way I figure out the values of the patch_size
andcode_size
.
In the second part by final instance segmentation do you mean the ground truth of the same image or segmentation prediction by the network?
I mean the segmentation output. Why is the number this low, what kind of mistakes does it make, does it get the shape right but split instances, does it merge everything etc? General tip, look at the data a lot, the raw data, the ground truth data, the segmentation results etc, this usually gives you good indicators what works and what is still going wrong.
About the
patch_size
andcode_size
, what are the maximum values that these two could be set to? And by analyzing my dataset and trying to find out what the maximum size of the narrowest overlap could be where the one instance is completely split into two, would be the way I figure out the values of thepatch_size
andcode_size
.
Yes, that would be one way to figure out appropriate values (remember though, you may not look at the test data to determine this, otherwise your results may be biased)
I mean the segmentation output.
@abred
The segmentation output is in the test/instanced/<checkpoint>/patch_threshold/fc_threshold/<sample>
right?
Also, I looked into my dataset, the overlaps are pretty inconsistent throughout. There are some samples having big overlaps. I think using a bigger pacth_size
would be better suited for my data. Wanted to know what would be the biggest pacth_size
that can be used for ppp+dec architecture?
For ed-ppp I guess a patch_size
of 81*81
is used in one of the experiments. While for ppp+dec the highest patch_size
I came across was 49*49
is it possible to have a higher patch_size
that this?
Also, to change the patch_size
the below parameters need to be changed, right?
Are there any other parameters that need to be changed?
[model]
patchshape=[1,41,41]
[autocendocer] train_input_shape = [1, 41, 41] test_input_shape = [1, 41, 41] patchshape = [1, 41, 41]
I got a good result with image size 512 for 700k iterations:
INFO:__main__:confusion_matrix.avAP TEST checkpoint 600000: 0.5871 ({'patch_threshold': 0.9, 'fc_threshold': 0.5})
Now, when I try for patch_size= 49*49
this error is being raised. I have changed the following parameters to do that:
[model]
patchshape=[1,49,49]
[autocendocer]
train_input_shape = [1, 49, 49]
test_input_shape = [1, 49, 49]
patchshape = [1, 49, 49]
Traceback (most recent call last):
File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
return fn(*args)
File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
target_list, run_metadata)
File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Input to reshape is a tensor with 258048 values, but the requested shape requires a multiple of 343
[[{{node deflatten_out}}]]
(1) Invalid argument: Input to reshape is a tensor with 258048 values, but the requested shape requires a multiple of 343
[[{{node deflatten_out}}]]
[[add/_15]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 110, in wrapper
ret = func(*args, **kwargs)
File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 386, in train
**config.get('preprocessing', {}))
File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/wormbodies/02_setups/setup08/train.py", line 281, in train_until
pipeline.request_batch(request)
File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/nodes/batch_provider.py", line 146, in request_batch
batch = self.provide(copy.deepcopy(request))
File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/batch_provider_tree.py", line 45, in provide
return self.output.request_batch(request)
File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/nodes/batch_provider.py", line 146, in request_batch
batch = self.provide(copy.deepcopy(request))
File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/nodes/batch_filter.py", line 128, in provide
batch = self.get_upstream_provider().request_batch(upstream_request)
File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/nodes/batch_provider.py", line 146, in request_batch
batch = self.provide(copy.deepcopy(request))
File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/nodes/batch_filter.py", line 128, in provide
batch = self.get_upstream_provider().request_batch(upstream_request)
File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/nodes/batch_provider.py", line 146, in request_batch
batch = self.provide(copy.deepcopy(request))
File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/nodes/batch_filter.py", line 134, in provide
self.process(batch, request)
File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/nodes/generic_train.py", line 151, in process
self.train_step(batch, request)
File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/tensorflow/nodes/train.py", line 278, in train_step
feed_dict=inputs, options=run_options)
File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 956, in run
run_metadata_ptr)
File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
feed_dict_tensor, options, run_metadata)
File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
run_metadata)
File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Input to reshape is a tensor with 258048 values, but the requested shape requires a multiple of 343
[[node deflatten_out (defined at /anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
(1) Invalid argument: Input to reshape is a tensor with 258048 values, but the requested shape requires a multiple of 343
[[node deflatten_out (defined at /anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
[[add/_15]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'deflatten_out':
File "/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 1638, in <module>
main()
File "/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 1458, in main
train(args, config, train_folder)
File "/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 123, in wrapper
p.start()
File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/multiprocessing/process.py", line 112, in start
self._popen = self._Popen(self)
File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/multiprocessing/context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/multiprocessing/context.py", line 277, in _Popen
return Popen(process_obj)
File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
self._launch(process_obj)
File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/multiprocessing/popen_fork.py", line 74, in _launch
code = process_obj._bootstrap()
File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 110, in wrapper
ret = func(*args, **kwargs)
File "/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 386, in train
**config.get('preprocessing', {}))
File "/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/wormbodies/02_setups/setup08/train.py", line 276, in train_until
with gp.build(pipeline):
File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/build.py", line 12, in __enter__
self.batch_provider.setup()
File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/batch_provider_tree.py", line 17, in setup
self.__rec_setup(self.output)
File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/batch_provider_tree.py", line 70, in __rec_setup
self.__rec_setup(upstream_provider)
File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/batch_provider_tree.py", line 70, in __rec_setup
self.__rec_setup(upstream_provider)
File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/batch_provider_tree.py", line 71, in __rec_setup
provider.setup()
File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/nodes/generic_train.py", line 114, in setup
self.start()
File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/tensorflow/nodes/train.py", line 217, in start
checkpoint = self.__read_meta_graph()
File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/tensorflow/nodes/train.py", line 333, in __read_meta_graph
clear_devices=True)
File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 1453, in import_meta_graph
**kwargs)[0]
File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 1477, in _import_meta_graph_with_return_elements
**kwargs))
File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/framework/meta_graph.py", line 809, in import_scoped_meta_graph_with_return_elements
return_elements=return_elements)
File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/framework/importer.py", line 405, in import_graph_def
producer_op_list=producer_op_list)
File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/framework/importer.py", line 517, in _import_graph_def_internal
_ProcessNewOps(graph)
File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/framework/importer.py", line 243, in _ProcessNewOps
for new_op in graph._add_new_tf_operations(compute_devices=False): # pylint: disable=protected-access
File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3561, in _add_new_tf_operations
for c_op in c_api_util.new_tf_operations(self)
File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3561, in <listcomp>
for c_op in c_api_util.new_tf_operations(self)
File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3451, in _create_op_from_tf_operation
ret = Operation(c_op, self)
File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
self._traceback = tf_stack.extract_stack()
Traceback (most recent call last):
File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 1638, in <module>
main()
File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 1458, in main
train(args, config, train_folder)
File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 127, in wrapper
raise RuntimeError("child process died")
RuntimeError: child process died
What are the things that I should take into consideration?
I got a good result with image size 512 for 700k iterations:
INFO:__main__:confusion_matrix.avAP TEST checkpoint 600000: 0.5871 ({'patch_threshold': 0.9, 'fc_threshold': 0.5})
Nice Have you looked at one of the segmentation results, want to post one here?
Now, when I try for
patch_size= 49*49
this error is being raised. I have changed the following parameters to do that:[model] patchshape=[1,49,49] [autocendocer] train_input_shape = [1, 49, 49] test_input_shape = [1, 49, 49] patchshape = [1, 49, 49]
What are the things that I should take into consideration?
I think you might also have to change [autoencoder].code_units
(I just noticed that there are two code_units
in the config, will fix that)
For a patch size of 41 the spatial bottle neck of the autoencoder is 6*6
(2d) and 7
feature maps -> 6*6*7 = 252
You have to check in your log what the resulting bottle neck size is for 49 and then adapt the value of code_units
2. Also, I looked into my dataset, the overlaps are pretty inconsistent throughout. There are some samples having big overlaps. I think using a bigger `pacth_size` would be better suited for my data. Wanted to know what would be the biggest `pacth_size` that can be used for ppp+dec architecture? For ed-ppp I guess a `patch_size ` of `81*81` is used in one of the experiments. While for ppp+dec the highest `patch_size` I came across was `49*49` is it possible to have a higher `patch_size` that this?
You have to experiment with that, larger patches might require a larger code code_units
(but that also depends on the kind of data) which needs more memory.
If the patches are too large they cover a large part of the image and the network might run into difficulties to properly learn them.
But larger patches mean that you need fewer of them to cover the image.
- The segmentation output is in the
test/instanced/<checkpoint>/patch_threshold/fc_threshold/<sample>
right?
Is this the correct place to see the segmentation? If yes than here is a sample of the segmentation result.
I think you might also have to change
[autoencoder].code_units
(I just noticed that there are twocode_units
in the config, will fix that) For a patch size of 41 the spatial bottle neck of the autoencoder is6*6
(2d) and7
feature maps ->6*6*7 = 252
You have to check in your log what the resulting bottle neck size is for 49 and then adapt the value ofcode_units
I figured out that For patch_size=49*49
the code_units=343
which is 7*7*7
.
The experiment is running, I think this helped. Thanks for the help @abred.
About there being two code_units in the config file? What needs to changed?
About there being two code_units in the config file? What needs to changed?
Don't worry about it, just set both to the same value, when I have time I will check if they are really both necessary.
- The segmentation output is in the
test/instanced/<checkpoint>/patch_threshold/fc_threshold/<sample>
right?Is this the correct place to see the segmentation? If yes than here is a sample of the segmentation result.
yes, exactly
Hello @abred,
I tried an experiment with patch_size =49*49
, it ran correctly and I have achieved significant improvements as well.
I trained the model for 100k iterations only and achieved the following results:
INFO:__main__:confusion_matrix.avAP TEST checkpoint 100000: 0.4229 ({'patch_threshold': 0.9, 'fc_threshold': 0.5})
The segmentation results look like this:
Test\Instanced
:
Test\processe\pred_affs
:
test\process\pred_affs
Zarr file and test\instanced\
HDF file results?I am running another experiment with patch_size= 55*55
; the training was successful but the code seems to have been stuck at this point:
INFO:__main__:Skipping decoding for 06_46. Already exists!
INFO:__main__:Skipping decoding for 06_36. Already exists!
INFO:__main__:Skipping decoding for 06_33. Already exists!
INFO:__main__:Skipping decoding for 06_29. Already exists!
INFO:__main__:Skipping decoding for 06_64. Already exists!
INFO:__main__:Skipping decoding for 06_90. Already exists!
INFO:__main__:Skipping decoding for 06_58. Already exists!
INFO:__main__:Skipping decoding for 06_12. Already exists!
INFO:__main__:vote_instances checkpoint 100000 {'patch_threshold': 0.5, 'fc_threshold': 0.5}
INFO:__main__:reading data from /home/student2/Desktop/Parag_masterthesis/experimentresults/wormbodies_setup08_210227_090926/val/processed/100000
['06_92', '06_80', '06_73', '06_8', '06_82', '06_86', '06_55', '06_60', '06_65', '06_77', '06_48', '06_23', '06_42', '06_31', '06_51', '06_24', '06_40', '06_3', '06_76', '06_17', '06_19', '06_68', '06_4', '06_21', '06_46', '06_36', '06_33', '06_29', '06_64', '06_90', '06_58', '06_12']
['06_92', '06_80', '06_73', '06_8', '06_82', '06_86', '06_55', '06_60', '06_65', '06_77', '06_48', '06_23', '06_42', '06_31', '06_51', '06_24', '06_40', '06_3', '06_76', '06_17', '06_19', '06_68', '06_4', '06_21', '06_46', '06_36', '06_33', '06_29', '06_64', '06_90', '06_58', '06_12']
['06_92', '06_80', '06_73', '06_8', '06_82', '06_86', '06_55', '06_60', '06_65', '06_77', '06_48', '06_23', '06_42', '06_31', '06_51', '06_24', '06_40', '06_3', '06_76', '06_17', '06_19', '06_68', '06_4', '06_21', '06_46', '06_36', '06_33', '06_29', '06_64', '06_90', '06_58', '06_12']
INFO:__main__:forking <function vote_instances_sample_seq at 0x7f73b5926200>
INFO:__main__:Skipping vote instances for 06_92. Already exists!
INFO:__main__:forking <function vote_instances_sample_seq at 0x7f73b5926200>
INFO:__main__:Skipping vote instances for 06_80. Already exists!
INFO:__main__:forking <function vote_instances_sample_seq at 0x7f73b5926200>
INFO:PatchPerPix.vote_instances.vote_instances:processing /home/student2/Desktop/Parag_masterthesis/experimentresults/wormbodies_setup08_210227_090926/val/processed/100000/06_73.zarr
INFO:PatchPerPix.vote_instances.utilVoteInstances:keys: ['volumes']
INFO:PatchPerPix.vote_instances.vote_instances:affinities shape: (3025, 1, 512, 512)
INFO:PatchPerPix.vote_instances.vote_instances:numinst_ shape: (1, 1, 512, 512)
INFO:PatchPerPix.vote_instances.vote_instances:input image shape: (1, 512, 512)
INFO:PatchPerPix.vote_instances.vote_instances:Number fg pixel: 46069
INFO:PatchPerPix.vote_instances.vote_instances:overlap mask: (1, 512, 512) 0
INFO:PatchPerPix.vote_instances.vote_instances:copy fg to fg2cover
INFO:PatchPerPix.vote_instances.vote_instances:Number overlapping pixel: 0
INFO:PatchPerPix.vote_instances.vote_instances:Number pixel fg_to_cover: 46069
INFO:PatchPerPix.vote_instances.vote_instances:num foreground pixels: 46069
INFO:PatchPerPix.vote_instances.vote_instances:num foreground pixels 42362
INFO:PatchPerPix.vote_instances.vote_instances:num foreground pixels excluding boundary region: 42362
INFO:PatchPerPix.vote_instances.utilVoteInstances:bg in rank: th/2
INFO:PatchPerPix.vote_instances.utilVoteInstances:accumulate normalized prob product [0,1] as aff
INFO:PatchPerPix.vote_instances.consensus_array:compute affs and cntr separately
INFO:PatchPerPix.vote_instances.consensus_array:consensus array shape (1, 110, 110, 1, 512, 512)
INFO:PatchPerPix.vote_instances.consensus_array:creating consensus array /home/student2/Desktop/Parag_masterthesis/experimentresults/wormbodies_setup08_210227_090926/val/processed/100000/06_73.zarr
1. Are the segmentation results good enough for the value of AP?
That's hard to judge, especially without ground truth segmentation and the raw image. Also, you should use a different colormap for the instance segmentation, if you have a continuous one it is sometimes hard to see if it is the exact same shade or slightly different.
2. What is the difference between the two `test\process\pred_affs` Zarr file and `test\instanced\` HDF file results?
In instanced is the instance segmentation result, after the whole pipeline; in processed is just the prediction of the neural network. And pred_affs should have patch_size*patch_size channels, not just one.
3. I tried re-running the code as well but it still seems stuck, what could be the problem?
the larger the patch size and the more foreground pixels the longer it takes, but that step shouldn't take that long, how long did you wait?
That's hard to judge, especially without ground truth segmentation and the raw image. Also, you should use a different colormap for the instance segmentation, if you have a continuous one it is sometimes hard to see if it is the exact same shade or slightly different.
Okay, can you explain how do I implement a different colormap for my dataset?
the larger the patch size and the more foreground pixels the longer it takes, but that step shouldn't take that long, how long did you wait?
It was stuck at the same point for more than a day
That's hard to judge, especially without ground truth segmentation and the raw image. Also, you should use a different colormap for the instance segmentation, if you have a continuous one it is sometimes hard to see if it is the exact same shade or slightly different.
Okay, can you explain how do I implement a different colormap for my dataset?
that's just visualization. Depends on the program you are using to look at it.
the larger the patch size and the more foreground pixels the longer it takes, but that step shouldn't take that long, how long did you wait?
It was stuck at the same point for more than a day
INFO:PatchPerPix.vote_instances.consensus_array:consensus array shape (1, 110, 110, 1, 512, 512)
that's about 12GB, I guess that is too large for your GPU. The code uses unified memory so you don't get an error, but it has to move a lot of data to and from the GPU all the time. I guess that makes it really slow.
You could try using blockwise processing (e.g. have a look at the differences in[vote_instances]
in here: https://github.com/Kainmueller-Lab/PatchPerPix_experiments/blob/master/nuclei3d/02_setups/setup01/config.toml)
Hello @abred ,
I am done training my model and now I want to fine-tune it, how can I use the weights of the trained model to fine-tune them? I tried training the same experiment with the dataset I want to fine-tune it, for some more iterations with smaller checkpoint sizes as the fine-tuning is being done only for 10% of the total training epochs. But it starts the training from the beginning as the checkpoints now is a smaller number than what it was for the full training.
Can you help me with this?
Hi, I am not sure I understand exactly what you mean, but two code blocks that you might have to modify are here: https://github.com/Kainmueller-Lab/PatchPerPix_experiments/blob/master/wormbodies/02_setups/setup08/train.py#L21-L27 and here: https://github.com/Kainmueller-Lab/gunpowder/blob/master/gunpowder/tensorflow/nodes/train.py#L273-L297 It looks for the latest/highest checkpoint it can find in the respective folder. I guess you could either change your filenames accordingly or change the code to load a specific checkpoint. And depending on what exactly you want to do maybe change a few parameters (e.g. checkpoints/(max_)iterations)
I am working with my own dataset. I am trying to use the considlate_data.py for the preprocessing of the data to get it in the correct format for the network. But I am facing a few problems, I am passing these parameters to run the file.
-i /home/student2/Desktop/Parag_masterthesis -o /home/student2/Desktop/Parag_masterthesis/newdata --raw-gfp-min 0 --raw-gfp-max 4095 --raw-bf-min 0 --raw-bf-max 3072 --out-format zarr --parallel 50
I am getting this error:
multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, *kwds)) File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 595, in call return self.func(args, **kwargs) File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/joblib/parallel.py", line 263, in call for func, args, kwargs in self.items] File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/joblib/parallel.py", line 263, in
for func, args, kwargs in self.items]
File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/wormbodies/01_data/consolidate_data.py", line 174, in work
raw_bf = load_array(raw_fns[1]).astype(np.float32)
IndexError: list index out of range
"""