Working with my own plant dataset

Kainmueller-Lab / PatchPerPix_experiments

experiments script for the PatchPerPix instance segmentation method

8 stars 3 forks source link

Working with my own plant dataset #4

Closed Paragjain10 closed 2 years ago

Paragjain10 commented 3 years ago

I am working with my own dataset. I am trying to use the considlate_data.py for the preprocessing of the data to get it in the correct format for the network. But I am facing a few problems, I am passing these parameters to run the file.

-i /home/student2/Desktop/Parag_masterthesis -o /home/student2/Desktop/Parag_masterthesis/newdata --raw-gfp-min 0 --raw-gfp-max 4095 --raw-bf-min 0 --raw-bf-max 3072 --out-format zarr --parallel 50

I am getting this error:

multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, *kwds)) File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 595, in call return self.func(args, **kwargs) File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/joblib/parallel.py", line 263, in call for func, args, kwargs in self.items] File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/joblib/parallel.py", line 263, in for func, args, kwargs in self.items] File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/wormbodies/01_data/consolidate_data.py", line 174, in work raw_bf = load_array(raw_fns[1]).astype(np.float32) IndexError: list index out of range """

abred commented 3 years ago

data_tmp[fg_coord] = \
        np.reshape(
            prediction['affinities'],
            (np.prod(prediction['affinities'].shape), 1, 1)
        )
I tried a few things first, but this was the only thing that got the code working. Is this change correct?

looks good I think, if it runs and the pred_affs folder are getting created.

After this, the code successfully computed the decode step but got stuck showing a similar error with exit code - 9. I think it occurs while computing the vote instances:

As it is the same error code it is probably outofmemory again, did you check dmesg? How much RAM do you have? And how big are your image again? You tried a right thing, however there is unfortunately some inconsistency between the config file and run_ppp, in vote_instances num_workers is not used anymore and but it is replaced by num_parallel_samples, which is not set in the config and thus already has a default of 1.

Paragjain10 commented 3 years ago

Yes @abred, the reason is the same. Os killing the processes. The dmesg is also similar:

(Parag_GreenAI) student2@BQ-DX1100-CT2:~/Desktop/Parag_masterthesis/PatchPerPix$ dmesg | egrep -i 'killed process'
[3944683.201497] Out of memory: Killed process 12089 (python) total-vm:26580804kB, anon-rss:25243156kB, file-rss:0kB, shmem-rss:4kB, UID:1003 pgtables:50116kB oom_score_adj:0
[3951431.125635] Out of memory: Killed process 14309 (python) total-vm:37894136kB, anon-rss:24871092kB, file-rss:73564kB, shmem-rss:10240kB, UID:1003 pgtables:50160kB oom_score_adj:0
[3952417.147848] Out of memory: Killed process 27522 (python) total-vm:37894132kB, anon-rss:24853440kB, file-rss:72924kB, shmem-rss:10240kB, UID:1003 pgtables:50128kB oom_score_adj:0
[3952670.222360] Out of memory: Killed process 30716 (python) total-vm:37894136kB, anon-rss:24847532kB, file-rss:74372kB, shmem-rss:10240kB, UID:1003 pgtables:50152kB oom_score_adj:0
[3952793.402220] Out of memory: Killed process 32425 (python) total-vm:37894136kB, anon-rss:24837228kB, file-rss:74248kB, shmem-rss:10240kB, UID:1003 pgtables:50128kB oom_score_adj:0
[3956437.478432] Out of memory: Killed process 19136 (python) total-vm:37894132kB, anon-rss:25219872kB, file-rss:73880kB, shmem-rss:10240kB, UID:1003 pgtables:50980kB oom_score_adj:0

The available RAM is of size 7981mb. Out of which 150-200mb is required by system processes. The resolution of my image is (1536,1536), the space required by the zarr files on the disc is 15 mb per sample which is 2.6 mb per sample for worm data. Also, when i was looking around the possible solutions to tackle the exit code -9 problem could be allocating only 6000mb from the total 7981 mb. This would prevent the overload and keep enough memory for system processes.

abred commented 3 years ago

You don't by any chance have access to a system with more RAM? 8GB is not really a lot anymore :) That would be the easiest solution. Hm it would be interesting to find out where exactly the outofmemory error is happening, but the oom-killer is preventing that. Are you the only one using this system and do you have root access? Then you could try temporarily disabling the oom-killer, then you should get a "proper" Python outofmemory exception which is easier to debug.

Alternatively, for our 3d datasets (e.g. nuclei3d) we process each image in blocks, partly for the same reason. However the code is written with 3d data in mind, as we never had memory issues with 2d data. So there you would have to make a large number of changes (main file is this one https://github.com/Kainmueller-Lab/PatchPerPix/blob/master/PatchPerPix/vote_instances/stitch_patch_graph.py)

Paragjain10 commented 3 years ago

@abred
How much more RAM would be appropriate ? What are the specifications of the GPU that you used for 2d data training? I am not sure about getting another system, I will have to talk to my supervisor about this. There is a possibility of getting a system with 12GB RAM, could that work?

No, I do not have root access to the system , there are other people using the system as well. First I'll try having a conversation about this with my supervisor, if nothing comes out as a solution. Then I will think about how should I re-work the code.

Paragjain10 commented 3 years ago

Hello @abred,

Is it possible that reducing training and testing input images shape could be helpful, for e.g. :
```
[model]
train_input_shape = [ 128, 128,]
test_input_shape = [ 128, 128,]
```
Is it possible to train the network with this size of an input image or there are changes that will be required in the network? Will the network accept this input size.
Also, would like to know what are the specifications of the GPU used by you for the training.

abred commented 3 years ago

Sorry, I am not sure how much RAM is necessary, that depends on the data. (note that the issue here is the system RAM not the GPU RAM (we used a rtx 2080TI))

During training the model is trained on random crops of size train_input_shape, the full image can be arbitrarily large. test_input_shape is only for prediction. The instance assembly, however, operates on the whole image (unless you use the block-wise processing)

What you can try is downscale your images in general and then work on those, this might result in a loss of accuracy though. (You have to train on the downscale images then, too)

Paragjain10 commented 3 years ago

Hello @abred, Thank you for your constant support.

As you suggested I downscaled the images and also implemented the method on a different system.

On the previous system I tried training the network with image size (512, 512) But it threw exitcode -9 at some point during vote_instances. So, I tried with image size (256, 256) In this case after training during prediction this error was raised:
```
AssertionError: reference RAW with ROI [0:512, 0:512] (512, 512) does not fit into provided upstream [-94:350, -94:350] (444, 444)
```
I changed the model.test_input_shape = [ 512, 512 ,] to this model.test_input_shape = [ 256, 256,] . Now the code is running and computing predictions. Is this change correct?

On the new system I started training for image size (512, 512). The training was done, while the prediction part was On after the prediction of one sample the code is freezing at this:


INFO:__main__:forking <function predict_sample at 0x7fd4786492f0>
INFO:__main__:predicting 05_56!
INFO:gunpowder.tensorflow.local_server:Server already running at b'grpc://localhost:36537'
INFO:gunpowder.tensorflow.nodes.predict:Initializing tf session, connecting to b'grpc://localhost:36537'...
WARNING:tensorflow:From /home/babrm/Desktop/Parag_GreenAi/patchperpix/gunpowder/tensorflow/nodes/predict.py:182: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

WARNING:tensorflow:From /home/babrm/Desktop/Parag_GreenAi/patchperpix/gunpowder/tensorflow/nodes/predict.py:182: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2021-01-29 08:41:05.789154: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1 2021-01-29 08:41:05.846785: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545 pciBusID: 0000:3b:00.0 2021-01-29 08:41:05.848374: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 1 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545 pciBusID: 0000:5e:00.0 2021-01-29 08:41:05.849948: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 2 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545 pciBusID: 0000:af:00.0 2021-01-29 08:41:05.850320: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0 2021-01-29 08:41:05.853370: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0 2021-01-29 08:41:05.855013: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0 2021-01-29 08:41:05.855382: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0 2021-01-29 08:41:05.857249: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0 2021-01-29 08:41:05.858726: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0 2021-01-29 08:41:05.863367: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2021-01-29 08:41:05.868082: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0, 1, 2 WARNING:tensorflow:From /home/babrm/Desktop/Parag_GreenAi/patchperpix/gunpowder/tensorflow/nodes/predict.py:236: The name tf.train.import_meta_graph is deprecated. Please use tf.compat.v1.train.import_meta_graph instead.

INFO:gunpowder.tensorflow.nodes.predict:Reading graph from /home/babrm/Desktop/Parag_GreenAi/patchperpix/results/wormbodies_setup08_210128_181802/test/test_net.meta and weights from /home/babrm/Desktop/Parag_GreenAi/patchperpix/results/wormbodies_setup08_210128_181802/train/train_net_checkpoint_400000... WARNING:tensorflow:From /home/babrm/Desktop/Parag_GreenAi/patchperpix/gunpowder/tensorflow/nodes/predict.py:236: The name tf.train.import_meta_graph is deprecated. Please use tf.compat.v1.train.import_meta_graph instead.

INFO:tensorflow:Restoring parameters from /home/babrm/Desktop/Parag_GreenAi/patchperpix/results/wormbodies_setup08_210128_181802/train/train_net_checkpoint_400000 INFO:tensorflow:Restoring parameters from /home/babrm/Desktop/Parag_GreenAi/patchperpix/results/wormbodies_setup08_210128_181802/train/train_net_checkpoint_400000


There is no progress after that. I tried running it multiple times.

abred commented 3 years ago

sounds ok
hmm, are all the @fork's still in place? But the first line of your log says forking so that seems ok. Could you please check that you are on the correct gunpowder commit? (a53bb55edc2d28a1acb310971571984cf0771cab) The line ...Initializing tf session... shouldn't be there I think.

(sorry clicked the wrong button :) )

Paragjain10 commented 3 years ago

@abred Thank you for the reply.

In the first experiment, where I am training for image size (25,256) i trained the network for 700k iterations. I have come across this problem where I am cross validating at check point 700k. But there seems to be this problem :

['01_23', '02_56', '10_1134', '05_74', '10_1124', '07_45', '01_11', '03_461', '08_469', '05_60', '02_3', '05_39', '02_17', '03_492', '10_1138', '10_1090', '04_1013', '10_1100', '03_437', '02_6', '06_51', '04_946', '09_747', '05_85', '07_92', '01_84', '10_1060', '09_753', '08_412', '08_421', '03_458', '07_58', '06_24', '04_979', '03_507', '02_14', '10_1107', '03_452', '03_531', '06_40', '01_25', '10_1135', '02_86', '01_73', '09_748', '03_475', '05_62', '08_491', '04_1019', '03_455', '06_3', '02_94', '09_726', '02_36', '03_477', '02_22', '06_76', '05_33', '03_528', '03_466', '02_90', '06_17', '03_502', '01_42', '10_1069', '03_471', '08_497', '09_768', '05_11', '08_407', '07_81', '01_74', '08_484', '01_29', '06_19', '03_467', '04_967', '07_51', '04_1031', '09_777', '08_423', '05_79', '06_68', '10_1067', '01_62', '07_42', '02_85', '07_29', '02_100', '07_85', '04_1018', '02_82', '06_4', '04_955', '02_24', '03_499', '07_13', '02_97', '01_14', '09_728', '04_1001', '03_509', '06_21', '07_63', '05_50', '04_1007', '04_1012', '04_1004', '01_82', '06_46', '10_1147', '02_50', '07_64', '04_940', '07_23', '08_404', '08_418', '04_958', '02_98', '04_1037', '02_48', '04_1033', '03_470', '04_999', '01_43', '09_735', '01_46', '05_87', '06_36', '10_1140', '05_56', '07_77', '03_515', '01_49', '01_59', '06_33', '03_446', '07_36', '06_29', '03_485', '04_1030', '06_64', '01_86', '08_415', '06_90', '01_68', '01_39', '09_756', '04_948', '01_28', '02_75', '09_779', '10_1114', '03_496', '03_505', '03_474', '09_775', '02_20', '07_33', '06_58', '10_1142', '01_63', '01_81', '05_25', '10_1076', '02_29', '04_938', '10_1080', '08_451', '05_7', '04_1005', '04_951', '04_1026', '03_519', '09_793', '06_12', '02_47', '10_1102', '09_785', '08_461', '01_6', '01_88', '08_496', '04_1028', '10_1104', '10_1133', '08_459', '07_41', '04_1032', '07_12', '10_1071', '07_54', '01_15', '02_15', '09_732', '02_2', '04_1006', '07_68', '07_18', '10_1129']
['01_23', '02_56', '10_1134', '05_74', '10_1124', '07_45', '01_11', '03_461', '08_469', '05_60', '02_3', '05_39', '02_17', '03_492', '10_1138', '10_1090', '04_1013', '10_1100', '03_437', '02_6', '06_51', '04_946', '09_747', '05_85', '07_92', '01_84', '10_1060', '09_753', '08_412', '08_421', '03_458', '07_58', '06_24', '04_979', '03_507', '02_14', '10_1107', '03_452', '03_531', '06_40', '01_25', '10_1135', '02_86', '01_73', '09_748', '03_475', '05_62', '08_491', '04_1019', '03_455', '06_3', '02_94', '09_726', '02_36', '03_477', '02_22', '06_76', '05_33', '03_528', '03_466', '02_90', '06_17', '03_502', '01_42', '10_1069', '03_471', '08_497', '09_768', '05_11', '08_407', '07_81', '01_74', '08_484', '01_29', '06_19', '03_467', '04_967', '07_51', '04_1031', '09_777', '08_423', '05_79', '06_68', '10_1067', '01_62', '07_42', '02_85', '07_29', '02_100', '07_85', '04_1018', '02_82', '06_4', '04_955', '02_24', '03_499', '07_13', '02_97', '01_14', '09_728', '04_1001', '03_509', '06_21', '07_63', '05_50', '04_1007', '04_1012', '04_1004', '01_82', '06_46', '10_1147', '02_50', '07_64', '04_940', '07_23', '08_404', '08_418', '04_958', '02_98', '04_1037', '02_48', '04_1033', '03_470', '04_999', '01_43', '09_735', '01_46', '05_87', '06_36', '10_1140', '05_56', '07_77', '03_515', '01_49', '01_59', '06_33', '03_446', '07_36', '06_29', '03_485', '04_1030', '06_64', '01_86', '08_415', '06_90', '01_68', '01_39', '09_756', '04_948', '01_28', '02_75', '09_779', '10_1114', '03_496', '03_505', '03_474', '09_775', '02_20', '07_33', '06_58', '10_1142', '01_63', '01_81', '05_25', '10_1076', '02_29', '04_938', '10_1080', '08_451', '05_7', '04_1005', '04_951', '04_1026', '03_519', '09_793', '06_12', '02_47', '10_1102', '09_785', '08_461', '01_6', '01_88', '08_496', '04_1028', '10_1104', '10_1133', '08_459', '07_41', '04_1032', '07_12', '10_1071', '07_54', '01_15', '02_15', '09_732', '02_2', '04_1006', '07_68', '07_18', '10_1129']
{'01_6': 0.1411111111111111, '01_11': 0.14722222222222223, '01_14': 0.4333333333333333, '01_15': 0.19505494505494506, '01_25': 0.32166666666666666, '01_28': 0.225, '01_29': 0.19916666666666666, '01_39': 0.18, '01_42': 0.4916666666666666, '01_43': 0.175, '01_46': 0.045454545454545456, '01_49': 0.09395604395604396, '01_59': 0.03428571428571429, '01_62': 0.07846577227382182, '01_63': 0.035526315789473684, '01_68': 0.13015873015873014, '01_73': 0.0, '01_74': 0.2, '01_81': 0.23500000000000001, '01_82': 0.21444444444444444, '01_84': 0.30333333333333334, '01_86': 0.08333333333333333, '01_88': 0.25, '02_2': 0.6678571428571429, '02_3': 0.25555555555555554, '02_6': 0.292063492063492, '02_14': 0.4666666666666667, '02_15': 0.8, '02_17': 0.07291666666666666, '02_20': 0.2342929292929293, '02_22': 0.10500000000000001, '02_24': 0.04285714285714286, '02_29': 0.0, '02_36': 0.15535714285714283, '02_47': 0.06666666666666667, '02_48': 0.0, '02_50': 0.0, '02_56': 0.42777777777777776, '02_75': 0.2816666666666666, '02_82': 0.13666666666666666, '02_85': 0.053968253968253964, '02_86': 0.01, '02_90': 0.1, '02_94': 0.05934065934065934, '02_97': 0.04017857142857143, '02_98': 0.10287581699346404, '02_100': 0.05934065934065934, '03_437': 0.2760239760239761, '03_446': 0.30436507936507934, '03_452': 0.225, '03_455': 0.18727272727272726, '03_458': 0.10275197798417612, '03_461': 0.22432288299935355, '03_466': 0.1718031968031968, '03_467': 0.07857142857142856, '03_470': 0.14833333333333332, '03_471': 0.13727272727272727, '03_474': 0.23166666666666663, '03_475': 0.12857142857142856, '03_477': 0.05, '03_485': 0.48323232323232324, '03_492': 0.2875, '03_496': 0.30095238095238097, '03_499': 0.5900000000000001, '03_502': 0.24441558441558442, '03_505': 0.2643589743589744, '03_507': 0.36317460317460315, '03_509': 0.36821289821289827, '03_515': 0.4515151515151515, '03_519': 0.37523809523809526, '03_528': 0.3254700854700855, '03_531': 0.29103785103785107, '04_938': 0.17572074983839692, '04_940': 0.24043839330604033, '04_946': 0.17589807852965747, '04_948': 0.1353973168214654, '04_951': 0.21678904428904427, '04_955': 0.17207417582417583, '04_958': 0.15402391725921138, '04_967': 0.3315584415584416, '04_979': 0.15256077256077255, '04_999': 0.12333333333333334, '04_1001': 0.06153846153846154, '04_1004': 0.10060606060606062, '04_1005': 0.13293650793650794, '04_1006': 0.01818181818181818, '04_1007': 0.007142857142857143, '04_1012': 0.14821428571428572, '04_1013': 0.14373065015479874, '04_1018': 0.1808467246547742, '04_1019': 0.14805860805860807, '04_1026': 0.2618648018648019, '04_1028': 0.10108359133126936, '04_1030': 0.14163170163170163, '04_1031': 0.0771978021978022, '04_1032': 0.19213286713286715, '04_1033': 0.07922077922077922, '04_1037': 0.00909090909090909, '05_7': 0.0890909090909091, '05_11': 0.12896825396825398, '05_25': 0.19947552447552447, '05_33': 0.059583333333333335, '05_39': 0.038699690402476776, '05_50': 0.06666666666666667, '05_56': 0.11787878787878787, '05_60': 0.08205128205128205, '05_62': 0.19047619047619047, '05_74': 0.14421703296703298, '05_79': 0.016666666666666666, '05_85': 0.048571428571428564, '05_87': 0.04285714285714286, '06_3': 0.1026487788097695, '06_4': 0.027485380116959064, '06_12': 0.3196969696969697, '06_17': 0.04162581699346406, '06_19': 0.025313283208020048, '06_21': 0.09579248366013073, '06_24': 0.005555555555555555, '06_29': 0.005263157894736842, '06_33': 0.12879120879120878, '06_36': 0.06125541125541125, '06_40': 0.2224242424242424, '06_46': 0.1083916083916084, '06_51': 0.14636363636363633, '06_58': 0.12777777777777777, '06_64': 0.17266806722689076, '06_68': 0.12186274509803922, '06_76': 0.16291531997414352, '06_90': 0.19101190476190474, '07_12': 0.34979864820422096, '07_13': 0.22563193226808784, '07_18': 0.17745098039215687, '07_23': 0.16953296703296705, '07_29': 0.29525474525474527, '07_33': 0.20757575757575758, '07_36': 0.225, '07_41': 0.13833333333333334, '07_42': 0.20171717171717174, '07_45': 0.08007395940832474, '07_51': 0.13761446886446888, '07_54': 0.1171671826625387, '07_58': 0.05263157894736842, '07_63': 0.12818181818181817, '07_64': 0.16638888888888886, '07_68': 0.1527777777777778, '07_77': 0.1509090909090909, '07_81': 0.16940359477124184, '07_85': 0.21112637362637363, '07_92': 0.013333333333333332, '08_404': 0.02222222222222222, '08_407': 0.1223661921602425, '08_412': 0.16533496732026146, '08_415': 0.07263157894736842, '08_418': 0.12030075187969924, '08_421': 0.22626696832579185, '08_423': 0.2731203007518797, '08_451': 0.0058823529411764705, '08_459': 0.00625, '08_461': 0.0963690476190476, '08_469': 0.30953907203907205, '08_484': 0.09772727272727273, '08_491': 0.0, '08_496': 0.29393939393939394, '08_497': 0.31437229437229436, '09_726': 0.1880250257997936, '09_728': 0.03333333333333333, '09_732': 0.08719298245614035, '09_735': 0.10487637362637363, '09_747': 0.21323412698412697, '09_748': 0.17013243894048846, '09_753': 0.05, '09_756': 0.014285714285714285, '09_768': 0.2925141525141525, '09_775': 0.06153846153846154, '09_777': 0.05, '09_779': 0.36750000000000005, '09_785': 0.3990909090909091, '09_793': 0.22764877880976955, '10_1060': 0.114025974025974, '10_1067': 0.2155050505050505, '10_1069': 0.14615384615384613, '10_1071': 0.19777777777777777, '10_1076': 0.17555555555555555, '10_1080': 0.4066666666666666, '10_1090': 0.12758857929136563, '10_1100': 0.1764867485455721, '10_1102': 0.2325124875124875, '10_1104': 0.18916157372039727, '10_1107': 0.38298701298701304, '10_1114': 0.258409825468649, '10_1124': 0.26328197945845006, '10_1129': 0.34016806722689075, '10_1133': 0.2568650793650794, '10_1134': 0.3880519480519481, '10_1135': 0.2892307692307693, '10_1138': 0.37096153846153845, '10_1140': 0.18700387331966278, '10_1142': 0.1847100725361595, '10_1147': 0.19282894736842104} ['01_6', '01_11', '01_14', '01_15', '01_23', '01_25', '01_28', '01_29', '01_39', '01_42', '01_43', '01_46', '01_49', '01_59', '01_62', '01_63', '01_68', '01_73', '01_74', '01_81', '01_82', '01_84', '01_86', '01_88', '02_2', '02_3', '02_6', '02_14', '02_15', '02_17', '02_20', '02_22', '02_24', '02_29', '02_36', '02_47', '02_48', '02_50', '02_56', '02_75', '02_82', '02_85', '02_86', '02_90', '02_94', '02_97', '02_98', '02_100', '03_437', '03_446', '03_452', '03_455', '03_458', '03_461', '03_466', '03_467', '03_470', '03_471', '03_474', '03_475', '03_477', '03_485', '03_492', '03_496', '03_499', '03_502', '03_505', '03_507', '03_509', '03_515', '03_519', '03_528', '03_531', '04_938', '04_940', '04_946', '04_948', '04_951', '04_955', '04_958', '04_967', '04_979', '04_999', '04_1001', '04_1004', '04_1005', '04_1006', '04_1007', '04_1012', '04_1013', '04_1018', '04_1019', '04_1026', '04_1028', '04_1030', '04_1031', '04_1032', '04_1033', '04_1037', '05_7', '05_11', '05_25', '05_33', '05_39', '05_50', '05_56', '05_60', '05_62', '05_74', '05_79', '05_85', '05_87', '06_3', '06_4', '06_12', '06_17', '06_19', '06_21', '06_24', '06_29', '06_33', '06_36', '06_40', '06_46', '06_51', '06_58', '06_64', '06_68', '06_76', '06_90', '07_12', '07_13', '07_18', '07_23', '07_29', '07_33', '07_36', '07_41', '07_42', '07_45', '07_51', '07_54', '07_58', '07_63', '07_64', '07_68', '07_77', '07_81', '07_85', '07_92', '08_404', '08_407', '08_412', '08_415', '08_418', '08_421', '08_423', '08_451', '08_459', '08_461', '08_469', '08_484', '08_491', '08_496', '08_497', '09_726', '09_728', '09_732', '09_735', '09_747', '09_748', '09_753', '09_756', '09_768', '09_775', '09_777', '09_779', '09_785', '09_793', '10_1060', '10_1067', '10_1069', '10_1071', '10_1076', '10_1080', '10_1090', '10_1100', '10_1102', '10_1104', '10_1107', '10_1114', '10_1124', '10_1129', '10_1133', '10_1134', '10_1135', '10_1138', '10_1140', '10_1142', '10_1147']
199 200
Traceback (most recent call last):
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 1623, in <module>
    main()
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 1619, in main
    cross_validate(args, config, config['data']['val_data'], train_folder, val_folder)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 110, in wrapper
    ret = func(*args, **kwargs)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 1277, in cross_validate
    assert len(v[0]) == len(samples)
AssertionError

The length of v[0] is 199 and the length of samples is 200. In v[0] we have a tuple of the results.items, I tried to figure out what the problem is but failed to do so. Do you know what could be the reason.

abred commented 3 years ago

These assertions are sanity checks that everything went ok beforehand. Assuming you do have 200 samples, this indicates that some sample failed along the way. You could try inspecting the difference between v[0].keys() and samples. When you find the sample that is not in v[0], the easiest option is to delete all files related to that sample in evaluated/instanced/processed and recompute them. That should fix it hopefully.

Paragjain10 commented 3 years ago

@abred I have trained the model with image size (256, 256) for 700k iterations.

Config file:

base = "/home/student2/Desktop/Parag_masterthesis/experimentresults/wormbodies_setup08_210128_152543"

[general]
logging = 20
debug = false
overwrite = false

[data]
train_data = "/home/student2/Desktop/Parag_masterthesis/traintest_data/train"
val_data = "/home/student2/Desktop/Parag_masterthesis/traintest_data/test"
test_data = "/home/student2/Desktop/Parag_masterthesis/traintest_data/test"
voxel_size = [ 1, 1,]
input_format = "zarr"
raw_key = "volumes/raw_bf"
gt_key = "volumes/gt_instances"
one_instance_per_channel_gt = "volumes/gt_labels"
num_channels = 1
validate_on_train = false

[model]
train_net_name = "train_net"
test_net_name = "test_net"
train_input_shape = [ 256, 256,]
test_input_shape = [ 256, 256,]
patchshape = [ 1, 41, 41,]
patchstride = [ 1, 1, 1,]
num_fmaps = 20
max_num_inst = 2
fmap_inc_factors = [ 2, 2, 2, 2,]
fmap_dec_factors = [ 1, 1, 1, 1,]
downsample_factors = [ [ 2, 2,], [ 2, 2,], [ 2, 2,], [ 2, 2,],]
activation = "relu"
padding = "valid"
kernel_size = 3
num_repetitions = 2
upsampling = "resize_conv"
overlapping_inst = false
code_units = 252
autoencoder_chkpt = "this"

[optimizer]
optimizer = "Adam"
lr = 5e-5

[preprocessing]
clipmax = 1500

[training]
batch_size = 1
num_gpus = 1
num_workers = 10
cache_size = 40
max_iterations = 700100
checkpoints = 50000
snapshots = 2000
profiling = 500
train_code = true

[prediction]
output_format = "zarr"
aff_key = "volumes/pred_affs"
code_key = "volumes/pred_code"
fg_key = "volumes/pred_numinst"
fg_thresh = 0.5
decode_batch_size = 1024

[validation]
params = [ "patch_threshold", "fc_threshold",]
patch_threshold = [ 0.5, 0.6, 0.7,]
fc_threshold = [ 0.5, 0.6, 0.7,]

[cross_validate]
checkpoints = [ 500000, 550000, 600000, 650000, 700000,]
patch_threshold = [ 0.5, 0.6, 0.7,]
fc_threshold = [ 0.5, 0.6, 0.7,]

[testing]
num_workers = 5

[vote_instances]
patch_threshold = 0.9
fc_threshold = 0.5
cuda = true
blockwise = false
num_workers = 8
chunksize = [ 92, 92, 92,]
select_patches_for_sparse_data = true
save_no_intermediates = true
output_format = "hdf"
parallel = false
includeSinglePatchCCS = false
sample = 1.0
removeIntersection = true
mws = true
isbiHack = false
mask_fg_border = false
graphToInst = false
skipLookup = false
skipConsensus = false
skipRanking = false
skipThinCover = false
affinity_graph_voting = false
affinity_graph_voting_selected = false
termAfterThinCover = false
fg_thresh_vi = -0.1
consensus_interleaved_cnt = false
consensus_norm_prob_product = true
consensus_prob_product = true
consensus_norm_aff = true
vi_bg_use_inv_th = false
vi_bg_use_half_th = true
vi_bg_use_less_than_th = false
rank_norm_patch_score = true
rank_int_counter = false
patch_graph_norm_aff = true
blockwise_old_stitch_fn = false
only_bb = false
flip_cons_arr_axes = false
return_intermediates = false

[evaluation]
num_workers = 1
res_key = "vote_instances"
metric = "confusion_matrix.avAP"
print_f_factor_perc_gt_0_8 = false
use_linear_sum_assignment = false
foreground_only = false

[postprocessing]
remove_small_comps = 600

[visualize]
samples_to_visualize = [ "01_23", "02_56",]
show_patches = true

[autoencoder]
overlapping_inst = true
code_method = "conv1x1_b"
train_net_name = "train_net"
test_net_name = "test_net"
train_input_shape = [ 1, 41, 41,]
test_input_shape = [ 1, 41, 41,]
patchshape = [ 1, 41, 41,]
patchstride = [ 1, 1, 1,]
network_type = "conv"
activation = "relu"
code_activation = "sigmoid"
encoder_units = [ 500, 1000,]
decoder_units = [ 1000, 500,]
num_fmaps = [ 32, 48, 64,]
downsample_factors = [ [ 2, 2,], [ 2, 2,], [ 2, 2,],]
upsampling = "resize_conv"
kernel_size = 3
num_repetitions = 2
padding = "same"
code_units = 252
regularizer = "l2"
regularizer_weight = 0.0001
loss_fn = "mse"

[training.sampling]
min_masked = 0.002
min_masked_overlap = 0.002
overlap_min_dist = 0
overlap_max_dist = 15
probability_overlap = 0.5
probability_fg = 0.5

[postprocessing.watershed]
output_format = "hdf"

[training.augmentation.elastic]
control_point_spacing = [ 40, 40,]
jitter_sigma = [ 2, 2,]
rotation_min = 0
rotation_max = 90
subsample = 2

[training.augmentation.intensity]
scale = [ 0.9, 1.1,]
shift = [ -0.1, 0.1,]

[training.augmentation.simple]

The results are :

INFO:__main__:confusion_matrix.avAP CROSS: 0.2227 [0.2308 ((650000, 0.6, 0.6)), 0.2146 ((650000, 0.6, 0.6))]
confusion_matrix.avAP CROSS: 0.2227 [0.2308 ((650000, 0.6, 0.6)), 0.2146 ((650000, 0.6, 0.6))]
confusion_matrix.avAP: 0.2227
confusion_matrix.th_0_5.AP: 0.4341
confusion_matrix.th_0_6.AP: 0.3346
confusion_matrix.th_0_7.AP: 0.2460
confusion_matrix.th_0_8.AP: 0.1571
confusion_matrix.th_0_9.AP: 0.0576

INFO:__main__:time cross_validate: 4:17:56.810282

What do you think about the results? Do you think optimizing would get the results comparatively high or The image size is playing a big role in the results being this low?

abred commented 3 years ago

The numbers are indeed quite low, but it is hard to analyze from afar. You can look at the results qualitatively. Does the fg-bg segmentation look ok? In visualize there are scripts to look at the patches, how do they look? Have you had a look at the downsized images, could you still segment them properly by hand?

An unrelated comment, I would recommend splitting the data into train/val/test and then use validate/validate_checkpoints instead of cross_validate. We just did that for the wormbodies data because of prior work.

Paragjain10 commented 3 years ago

The numbers are indeed quite low, but it is hard to analyze from afar. You can look at the results qualitatively. Does the fg-bg segmentation look ok? In visualize there are scripts to look at the patches, how do they look? Have you had a look at the downsized images, could you still segment them properly by hand?

An unrelated comment, I would recommend splitting the data into train/val/test and then use validate/validate_checkpoints instead of cross_validate. We just did that for the wormbodies data because of prior work.

Yes, I tried visualizing using the do task argument -do visualize

where in the config.toml I give these samples:

[visualize]
samples_to_visualize = [ "01_23", "02_56",]
show_patches = true

But i get no visualization in the output :

/home/student2/anaconda3/envs/Parag_GreenAI/bin/python /home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py --setup setup08 --config /home/student2/Desktop/Parag_masterthesis/experimentresults/wormbodies_setup08_210128_152543/config.toml --do visualize --app wormbodies -id /home/student2/Desktop/Parag_masterthesis/experimentresults/wormbodies_setup08_210128_152543
INFO:__main__:attention: using config file ['/home/student2/Desktop/Parag_masterthesis/experimentresults/wormbodies_setup08_210128_152543/config.toml']
INFO:__main__:CUDA_VISIBILE_DEVICES already set, device 0
INFO:__main__:used config: {'base': '/home/student2/Desktop/Parag_masterthesis/experimentresults/wormbodies_setup08_210128_152543', 'general': {'logging': 20, 'debug': False, 'overwrite': False}, 'data': {'train_data': '/home/student2/Desktop/Parag_masterthesis/traintest_data/train', 'val_data': '/home/student2/Desktop/Parag_masterthesis/traintest_data/test', 'test_data': '/home/student2/Desktop/Parag_masterthesis/traintest_data/test', 'voxel_size': [1, 1], 'input_format': 'zarr', 'raw_key': 'volumes/raw_bf', 'gt_key': 'volumes/gt_instances', 'one_instance_per_channel_gt': 'volumes/gt_labels', 'num_channels': 1, 'validate_on_train': False}, 'model': {'train_net_name': 'train_net', 'test_net_name': 'test_net', 'train_input_shape': [256, 256], 'test_input_shape': [256, 256], 'patchshape': [1, 41, 41], 'patchstride': [1, 1, 1], 'num_fmaps': 20, 'max_num_inst': 2, 'fmap_inc_factors': [2, 2, 2, 2], 'fmap_dec_factors': [1, 1, 1, 1], 'downsample_factors': [[2, 2], [2, 2], [2, 2], [2, 2]], 'activation': 'relu', 'padding': 'valid', 'kernel_size': 3, 'num_repetitions': 2, 'upsampling': 'resize_conv', 'overlapping_inst': False, 'code_units': 252, 'autoencoder_chkpt': 'this'}, 'optimizer': {'optimizer': 'Adam', 'lr': 5e-05}, 'preprocessing': {'clipmax': 1500}, 'training': {'batch_size': 1, 'num_gpus': 1, 'num_workers': 10, 'cache_size': 40, 'max_iterations': 700100, 'checkpoints': 50000, 'snapshots': 2000, 'profiling': 500, 'train_code': True, 'sampling': {'min_masked': 0.002, 'min_masked_overlap': 0.002, 'overlap_min_dist': 0, 'overlap_max_dist': 15, 'probability_overlap': 0.5, 'probability_fg': 0.5}, 'augmentation': {'elastic': {'control_point_spacing': [40, 40], 'jitter_sigma': [2, 2], 'rotation_min': 0, 'rotation_max': 90, 'subsample': 2}, 'intensity': {'scale': [0.9, 1.1], 'shift': [-0.1, 0.1]}, 'simple': {}}}, 'prediction': {'output_format': 'zarr', 'aff_key': 'volumes/pred_affs', 'code_key': 'volumes/pred_code', 'fg_key': 'volumes/pred_numinst', 'fg_thresh': 0.5, 'decode_batch_size': 1024}, 'validation': {'params': ['patch_threshold', 'fc_threshold'], 'patch_threshold': [0.5, 0.6, 0.7], 'fc_threshold': [0.5, 0.6, 0.7]}, 'cross_validate': {'checkpoints': [500000, 550000, 600000, 650000, 700000], 'patch_threshold': [0.5, 0.6, 0.7], 'fc_threshold': [0.5, 0.6, 0.7]}, 'testing': {'num_workers': 5}, 'vote_instances': {'patch_threshold': 0.9, 'fc_threshold': 0.5, 'cuda': True, 'blockwise': False, 'num_workers': 8, 'chunksize': [92, 92, 92], 'select_patches_for_sparse_data': True, 'save_no_intermediates': True, 'output_format': 'hdf', 'parallel': False, 'includeSinglePatchCCS': False, 'sample': 1.0, 'removeIntersection': True, 'mws': True, 'isbiHack': False, 'mask_fg_border': False, 'graphToInst': False, 'skipLookup': False, 'skipConsensus': False, 'skipRanking': False, 'skipThinCover': False, 'affinity_graph_voting': False, 'affinity_graph_voting_selected': False, 'termAfterThinCover': False, 'fg_thresh_vi': -0.1, 'consensus_interleaved_cnt': False, 'consensus_norm_prob_product': True, 'consensus_prob_product': True, 'consensus_norm_aff': True, 'vi_bg_use_inv_th': False, 'vi_bg_use_half_th': True, 'vi_bg_use_less_than_th': False, 'rank_norm_patch_score': True, 'rank_int_counter': False, 'patch_graph_norm_aff': True, 'blockwise_old_stitch_fn': False, 'only_bb': False, 'flip_cons_arr_axes': False, 'return_intermediates': False}, 'evaluation': {'num_workers': 1, 'res_key': 'vote_instances', 'metric': 'confusion_matrix.avAP', 'print_f_factor_perc_gt_0_8': False, 'use_linear_sum_assignment': False, 'foreground_only': False}, 'postprocessing': {'remove_small_comps': 600, 'watershed': {'output_format': 'hdf'}}, 'visualize': {'samples_to_visualize': ['01_23', '02_56'], 'show_patches': True}, 'autoencoder': {'overlapping_inst': True, 'code_method': 'conv1x1_b', 'train_net_name': 'train_net', 'test_net_name': 'test_net', 'train_input_shape': [1, 41, 41], 'test_input_shape': [1, 41, 41], 'patchshape': [1, 41, 41], 'patchstride': [1, 1, 1], 'network_type': 'conv', 'activation': 'relu', 'code_activation': 'sigmoid', 'encoder_units': [500, 1000], 'decoder_units': [1000, 500], 'num_fmaps': [32, 48, 64], 'downsample_factors': [[2, 2], [2, 2], [2, 2]], 'upsampling': 'resize_conv', 'kernel_size': 3, 'num_repetitions': 2, 'padding': 'same', 'code_units': 252, 'regularizer': 'l2', 'regularizer_weight': 0.0001, 'loss_fn': 'mse'}}
INFO:__main__:reading data from /home/student2/Desktop/Parag_masterthesis/experimentresults/wormbodies_setup08_210128_152543/val/processed/700000
['01_23', '02_56', '10_1134', '05_74', '10_1124', '07_45', '01_11', '03_461', '08_469', '05_60', '02_3', '05_39', '02_17', '03_492', '10_1138', '10_1090', '04_1013', '10_1100', '03_437', '02_6', '06_51', '04_946', '09_747', '05_85', '07_92', '01_84', '10_1060', '09_753', '08_412', '08_421', '03_458', '07_58', '06_24', '04_979', '03_507', '02_14', '10_1107', '03_452', '03_531', '06_40', '01_25', '10_1135', '02_86', '01_73', '09_748', '03_475', '05_62', '08_491', '04_1019', '03_455', '06_3', '02_94', '09_726', '02_36', '03_477', '02_22', '06_76', '05_33', '03_528', '03_466', '02_90', '06_17', '03_502', '01_42', '10_1069', '03_471', '08_497', '09_768', '05_11', '08_407', '07_81', '01_74', '08_484', '01_29', '06_19', '03_467', '04_967', '07_51', '04_1031', '09_777', '08_423', '05_79', '06_68', '10_1067', '01_62', '07_42', '02_85', '07_29', '02_100', '07_85', '04_1018', '02_82', '06_4', '04_955', '02_24', '03_499', '07_13', '02_97', '01_14', '09_728', '04_1001', '03_509', '06_21', '07_63', '05_50', '04_1007', '04_1012', '04_1004', '01_82', '06_46', '10_1147', '02_50', '07_64', '04_940', '07_23', '08_404', '08_418', '04_958', '02_98', '04_1037', '02_48', '04_1033', '03_470', '04_999', '01_43', '09_735', '01_46', '05_87', '06_36', '10_1140', '05_56', '07_77', '03_515', '01_49', '01_59', '06_33', '03_446', '07_36', '06_29', '03_485', '04_1030', '06_64', '01_86', '08_415', '06_90', '01_68', '01_39', '09_756', '04_948', '01_28', '02_75', '09_779', '10_1114', '03_496', '03_505', '03_474', '09_775', '02_20', '07_33', '06_58', '10_1142', '01_63', '01_81', '05_25', '10_1076', '02_29', '04_938', '10_1080', '08_451', '05_7', '04_1005', '04_951', '04_1026', '03_519', '09_793', '06_12', '02_47', '10_1102', '09_785', '08_461', '01_6', '01_88', '08_496', '04_1028', '10_1104', '10_1133', '08_459', '07_41', '04_1032', '07_12', '10_1071', '07_54', '01_15', '02_15', '09_732', '02_2', '04_1006', '07_68', '07_18', '10_1129']
['01_23', '02_56', '10_1134', '05_74', '10_1124', '07_45', '01_11', '03_461', '08_469', '05_60', '02_3', '05_39', '02_17', '03_492', '10_1138', '10_1090', '04_1013', '10_1100', '03_437', '02_6', '06_51', '04_946', '09_747', '05_85', '07_92', '01_84', '10_1060', '09_753', '08_412', '08_421', '03_458', '07_58', '06_24', '04_979', '03_507', '02_14', '10_1107', '03_452', '03_531', '06_40', '01_25', '10_1135', '02_86', '01_73', '09_748', '03_475', '05_62', '08_491', '04_1019', '03_455', '06_3', '02_94', '09_726', '02_36', '03_477', '02_22', '06_76', '05_33', '03_528', '03_466', '02_90', '06_17', '03_502', '01_42', '10_1069', '03_471', '08_497', '09_768', '05_11', '08_407', '07_81', '01_74', '08_484', '01_29', '06_19', '03_467', '04_967', '07_51', '04_1031', '09_777', '08_423', '05_79', '06_68', '10_1067', '01_62', '07_42', '02_85', '07_29', '02_100', '07_85', '04_1018', '02_82', '06_4', '04_955', '02_24', '03_499', '07_13', '02_97', '01_14', '09_728', '04_1001', '03_509', '06_21', '07_63', '05_50', '04_1007', '04_1012', '04_1004', '01_82', '06_46', '10_1147', '02_50', '07_64', '04_940', '07_23', '08_404', '08_418', '04_958', '02_98', '04_1037', '02_48', '04_1033', '03_470', '04_999', '01_43', '09_735', '01_46', '05_87', '06_36', '10_1140', '05_56', '07_77', '03_515', '01_49', '01_59', '06_33', '03_446', '07_36', '06_29', '03_485', '04_1030', '06_64', '01_86', '08_415', '06_90', '01_68', '01_39', '09_756', '04_948', '01_28', '02_75', '09_779', '10_1114', '03_496', '03_505', '03_474', '09_775', '02_20', '07_33', '06_58', '10_1142', '01_63', '01_81', '05_25', '10_1076', '02_29', '04_938', '10_1080', '08_451', '05_7', '04_1005', '04_951', '04_1026', '03_519', '09_793', '06_12', '02_47', '10_1102', '09_785', '08_461', '01_6', '01_88', '08_496', '04_1028', '10_1104', '10_1133', '08_459', '07_41', '04_1032', '07_12', '10_1071', '07_54', '01_15', '02_15', '09_732', '02_2', '04_1006', '07_68', '07_18', '10_1129']

Process finished with exit code 0

I am downscaling the images in consolidate.py itself, so haven't seen the visualization explicitly for patchperpix but I have done it in the other experiments that I have performed for another approach. Will do it in this one as well.
About splitting the data in train/Val/test manner, my next step was going to be this as well. So, yes I will be running the experiments again with this combination next.

Paragjain10 commented 3 years ago

@abred

I just checked an HDF file is getting created of the samples I am trying to visualize at this location: /home/student2/Desktop/Parag_masterthesis/experimentresults/wormbodies_setup08_210128_152543/val/processed/70000

Is it correct? Or am I missing out on something?

abred commented 3 years ago

@abred

I just checked an HDF file is getting created of the samples I am trying to visualize at this location: /home/student2/Desktop/Parag_masterthesis/experimentresults/wormbodies_setup08_210128_152543/val/processed/70000

Is it correct? Or am I missing out on something?

yes, that sounds good, you can look at it with the hdf viewer or fiji/imagej.

In addition I would also look at the foreground prediction (pred_fgbg in the zarr file). Fiji can also visualize zarr files (File->Import->N5)

Paragjain10 commented 3 years ago

@abred The visualization of the hdf file of the sample 01_23 looks like this: Also, do I have to view the HDF files with certain specific parameters in Fiji or this look fine?

Unfortunately, I am not being to locate in which folder do I find pred_fgbg Zarr file. The three folders I could locate were pred_affs, pred_numist, pred_code. Could you tell me where do I find the pred_fgbg files?

abred commented 3 years ago

sorry pred_numinst should be the correct one as in your config file this is set fg_key = "volumes/pred_numinst".

The visualization of the hdf file has a very high resolution, if you haven't done it already, you have to zoom into some regions to have a closer look. In the end it is just a grayscale image with values between 0 and 1, default parameters should be fine (maybe brightness/contrast a bit (image->adjust)

Paragjain10 commented 3 years ago

Yes, I did zoom in and see the segmentation mask it can have a lot of improvement. But I was unable to open a pred_numist Zarr file. I cannot directly see the N5 option using (File->Import->N5). I had to use (Plugins->BigVataViewer->N5 viewer) instead. This error prompt displayed:

abred commented 3 years ago

I think that's a different plugin. You need this one https://github.com/saalfeldlab/n5-ij . Should be available through the fiji plugin system (Help->Update)

Paragjain10 commented 3 years ago

@abred I ran the experiment with image size 512 for 100k iterations. Also, I am using different sets for train, test and Val, performing validation/validate_checkpoints instead of cross_validate.

I have got good results just by training for 100k iterations as well: INFO:__main__:confusion_matrix.avAP TEST checkpoint 100000: 0.2211 ({'patch_threshold': 0.9, 'fc_threshold': 0.5})

Is it possible to optimize the network for 100k iterations and then finally training the optimized network for a maximum number of iterations. Because optimizing the network for 700k iterations or even more for bigger datasets is very time-consuming. Do you think this way of optimizing the network is possible with PatchPerpix?

Also, the parameters that I was interested in were code size and patch size. In the paper, the best results for various datasets are shown for various code sizes and patch sizes. In my case, the patch size is 41*41, and the code size is 252. Can you tell me what parameters should I take into consideration for the optimization of the network for my dataset?

Paragjain10 commented 3 years ago

@abred Also, I want the AP values in coco format. So, I have changed the formula for AP as : from ap = 1.*(apTP) / max(1, apTP + apFN + apFP) to this ap = 1. * (apTP) / max(1, apTP + apFP) in evaluate-instance-segmentation/evaluate.py

Do I need to make any more changes in the code?

Paragjain10 commented 3 years ago

Hello @abred,

I am running an experiment for a downscaled image size of 512 as I mentioned above. But I am training it for more number of iterations 700k iterations. I had a doubt regarding the time it is taking for training and evaluation it has been two days and it's on the evaluation step where it's evaluating on every checkpoint= 50k. So the overall computation time is really high right now. It is still computing for checkpoint 400k. Can I evaluate for only specific checkpoints or some other way where the evaluation time is reduced?

abred commented 3 years ago

@abred Also, I want the AP values in coco format. So, I have changed the formula for AP as : from ap = 1.*(apTP) / max(1, apTP + apFN + apFP) to this ap = 1. * (apTP) / max(1, apTP + apFP) in evaluate-instance-segmentation/evaluate.py

Do I need to make any more changes in the code?

That should be fine.

abred commented 3 years ago

Hello @abred,

I am running an experiment for a downscaled image size of 512 as I mentioned above. But I am training it for more number of iterations 700k iterations. I had a doubt regarding the time it is taking for training and evaluation it has been two days and it's on the evaluation step where it's evaluating on every checkpoint= 50k. So the overall computation time is really high right now. It is still computing for checkpoint 400k. Can I evaluate for only specific checkpoints or some other way where the evaluation time is reduced?

you can just add a line like for [cross_validate] to [validation] checkpoints = [ 500000, 550000, 600000, 650000, 700000,] otherwise it will check all available checkpoints.

abred commented 3 years ago

@abred I ran the experiment with image size 512 for 100k iterations. Also, I am using different sets for train, test and Val, performing validation/validate_checkpoints instead of cross_validate.

I have got good results just by training for 100k iterations as well: INFO:__main__:confusion_matrix.avAP TEST checkpoint 100000: 0.2211 ({'patch_threshold': 0.9, 'fc_threshold': 0.5})

Is it possible to optimize the network for 100k iterations and then finally training the optimized network for a maximum number of iterations. Because optimizing the network for 700k iterations or even more for bigger datasets is very time-consuming. Do you think this way of optimizing the network is possible with PatchPerpix?

Also, the parameters that I was interested in were code size and patch size. In the paper, the best results for various datasets are shown for various code sizes and patch sizes. In my case, the patch size is 41*41, and the code size is 252. Can you tell me what parameters should I take into consideration for the optimization of the network for my dataset?

Might work, to get an idea for good parameters it might be enough. I would recommend to e.g. look at the loss in tensorboard and check that it is decreasing nicely (That's important generally). And maybe evaluate on a few training and a few validation images to check that it doesn't overfit too much. The network also writes snapshots during training from time to time, you can look at those to check that it's working sensibly (you probably have to decode them first, similarly to what you do with the validation/test data) The necessary patch size depends strongly on your data, larger sizes might be required for larger overlaps. If that doesn't occur in your data you can keep it smaller. The code size is somewhat relative to the patch size. Maybe you need a different learning rate for your data. Or a deeper or shallower U-Net.

Paragjain10 commented 3 years ago

Thank you for all the answers @abred

The training loss looks fine according to me : To check if the model overfits is Val loss being computed? If we could log Val loss then we could easily figure out if the model overfits. I am not sure what evaluating on few training and validation means?
This is how the pred_fgbg of a snapshot at 690k looks:
Here is a sample of the dataset, where there is occlusion/overlap on the top left and bottom right of the image. Would you call this a large overlap? The overlaps are in a similar manner throughout the data.The patch size currently is 41*41 and code _size is 252. Is it appropriate for this kind of data?

abred commented 3 years ago

Thank you for all the answers @abred

1. The training loss looks fine according to me :
   To check if the model overfits is Val loss being computed? If we could log Val loss then we could easily figure out if the model overfits.
   I am not sure what evaluating on few training and validation means?

Val loss is currently not supported by the training framework that is used (gunpowder). That's why I recommended, just to get a rough idea of the performance, to evaluate a few training image and a few validation images and compare them quantitatively and qualitatively.

3. Here is a sample of the dataset, where there is occlusion/overlap on the top left and bottom right of the image. Would you call this a large overlap? The overlaps are in a similar manner throughout the data.The patch size currently is 41*41 and code _size is 252. Is it appropriate for this kind of data?

Looking at your data, I am not sure you can compare your numbers to ours, you only have very few larger instances. How does the final instance segmentation look like? Regarding the sizes, if you have an instance that is complete split into two parts by an overlapping second instance, it is only possible to merge the two parts again into one instance in the final instance segmentation if the overlap at the narrowest location is smaller than patch_size/2

Paragjain10 commented 3 years ago

Okay, I understood the first part.

In the second part by final instance segmentation do you mean the ground truth of the same image or segmentation prediction by the network? About the patch_size and code_size, what are the maximum values that these two could be set to? And by analyzing my dataset and trying to find out what the maximum size of the narrowest overlap could be where the one instance is completely split into two, would be the way I figure out the values of the patch_size andcode_size.

abred commented 3 years ago

In the second part by final instance segmentation do you mean the ground truth of the same image or segmentation prediction by the network?

I mean the segmentation output. Why is the number this low, what kind of mistakes does it make, does it get the shape right but split instances, does it merge everything etc? General tip, look at the data a lot, the raw data, the ground truth data, the segmentation results etc, this usually gives you good indicators what works and what is still going wrong.

About the patch_size and code_size, what are the maximum values that these two could be set to? And by analyzing my dataset and trying to find out what the maximum size of the narrowest overlap could be where the one instance is completely split into two, would be the way I figure out the values of the patch_size andcode_size.

Yes, that would be one way to figure out appropriate values (remember though, you may not look at the test data to determine this, otherwise your results may be biased)

Paragjain10 commented 3 years ago

I mean the segmentation output.

@abred

The segmentation output is in the test/instanced/<checkpoint>/patch_threshold/fc_threshold/<sample> right?
Also, I looked into my dataset, the overlaps are pretty inconsistent throughout. There are some samples having big overlaps. I think using a bigger pacth_size would be better suited for my data. Wanted to know what would be the biggest pacth_size that can be used for ppp+dec architecture? For ed-ppp I guess a patch_size of 81*81 is used in one of the experiments. While for ppp+dec the highest patch_size I came across was 49*49 is it possible to have a higher patch_size that this?
Also, to change the patch_size the below parameters need to be changed, right? Are there any other parameters that need to be changed?
```
[model]
patchshape=[1,41,41]
```

[autocendocer] train_input_shape = [1, 41, 41] test_input_shape = [1, 41, 41] patchshape = [1, 41, 41]

Paragjain10 commented 3 years ago

I got a good result with image size 512 for 700k iterations: INFO:__main__:confusion_matrix.avAP TEST checkpoint 600000: 0.5871 ({'patch_threshold': 0.9, 'fc_threshold': 0.5})

Now, when I try for patch_size= 49*49 this error is being raised. I have changed the following parameters to do that:

[model]
patchshape=[1,49,49]

[autocendocer]
train_input_shape = [1, 49, 49]
test_input_shape = [1, 49, 49]
patchshape = [1, 49, 49]

Traceback (most recent call last):
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: Input to reshape is a tensor with 258048 values, but the requested shape requires a multiple of 343
     [[{{node deflatten_out}}]]
  (1) Invalid argument: Input to reshape is a tensor with 258048 values, but the requested shape requires a multiple of 343
     [[{{node deflatten_out}}]]
     [[add/_15]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 110, in wrapper
    ret = func(*args, **kwargs)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 386, in train
    **config.get('preprocessing', {}))
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/wormbodies/02_setups/setup08/train.py", line 281, in train_until
    pipeline.request_batch(request)
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/nodes/batch_provider.py", line 146, in request_batch
    batch = self.provide(copy.deepcopy(request))
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/batch_provider_tree.py", line 45, in provide
    return self.output.request_batch(request)
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/nodes/batch_provider.py", line 146, in request_batch
    batch = self.provide(copy.deepcopy(request))
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/nodes/batch_filter.py", line 128, in provide
    batch = self.get_upstream_provider().request_batch(upstream_request)
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/nodes/batch_provider.py", line 146, in request_batch
    batch = self.provide(copy.deepcopy(request))
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/nodes/batch_filter.py", line 128, in provide
    batch = self.get_upstream_provider().request_batch(upstream_request)
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/nodes/batch_provider.py", line 146, in request_batch
    batch = self.provide(copy.deepcopy(request))
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/nodes/batch_filter.py", line 134, in provide
    self.process(batch, request)
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/nodes/generic_train.py", line 151, in process
    self.train_step(batch, request)
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/tensorflow/nodes/train.py", line 278, in train_step
    feed_dict=inputs, options=run_options)
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: Input to reshape is a tensor with 258048 values, but the requested shape requires a multiple of 343
     [[node deflatten_out (defined at /anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
  (1) Invalid argument: Input to reshape is a tensor with 258048 values, but the requested shape requires a multiple of 343
     [[node deflatten_out (defined at /anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
     [[add/_15]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'deflatten_out':
  File "/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 1638, in <module>
    main()
  File "/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 1458, in main
    train(args, config, train_folder)
  File "/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 123, in wrapper
    p.start()
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/multiprocessing/process.py", line 112, in start
    self._popen = self._Popen(self)
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/multiprocessing/context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/multiprocessing/context.py", line 277, in _Popen
    return Popen(process_obj)
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
    self._launch(process_obj)
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/multiprocessing/popen_fork.py", line 74, in _launch
    code = process_obj._bootstrap()
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 110, in wrapper
    ret = func(*args, **kwargs)
  File "/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 386, in train
    **config.get('preprocessing', {}))
  File "/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/wormbodies/02_setups/setup08/train.py", line 276, in train_until
    with gp.build(pipeline):
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/build.py", line 12, in __enter__
    self.batch_provider.setup()
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/batch_provider_tree.py", line 17, in setup
    self.__rec_setup(self.output)
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/batch_provider_tree.py", line 70, in __rec_setup
    self.__rec_setup(upstream_provider)
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/batch_provider_tree.py", line 70, in __rec_setup
    self.__rec_setup(upstream_provider)
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/batch_provider_tree.py", line 71, in __rec_setup
    provider.setup()
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/nodes/generic_train.py", line 114, in setup
    self.start()
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/tensorflow/nodes/train.py", line 217, in start
    checkpoint = self.__read_meta_graph()
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/tensorflow/nodes/train.py", line 333, in __read_meta_graph
    clear_devices=True)
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 1453, in import_meta_graph
    **kwargs)[0]
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 1477, in _import_meta_graph_with_return_elements
    **kwargs))
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/framework/meta_graph.py", line 809, in import_scoped_meta_graph_with_return_elements
    return_elements=return_elements)
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/framework/importer.py", line 405, in import_graph_def
    producer_op_list=producer_op_list)
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/framework/importer.py", line 517, in _import_graph_def_internal
    _ProcessNewOps(graph)
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/framework/importer.py", line 243, in _ProcessNewOps
    for new_op in graph._add_new_tf_operations(compute_devices=False):  # pylint: disable=protected-access
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3561, in _add_new_tf_operations
    for c_op in c_api_util.new_tf_operations(self)
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3561, in <listcomp>
    for c_op in c_api_util.new_tf_operations(self)
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3451, in _create_op_from_tf_operation
    ret = Operation(c_op, self)
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

Traceback (most recent call last):
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 1638, in <module>
    main()
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 1458, in main
    train(args, config, train_folder)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 127, in wrapper
    raise RuntimeError("child process died")
RuntimeError: child process died

What are the things that I should take into consideration?

abred commented 3 years ago

I got a good result with image size 512 for 700k iterations: INFO:__main__:confusion_matrix.avAP TEST checkpoint 600000: 0.5871 ({'patch_threshold': 0.9, 'fc_threshold': 0.5})

Nice Have you looked at one of the segmentation results, want to post one here?

Now, when I try for patch_size= 49*49 this error is being raised. I have changed the following parameters to do that:
[model]
patchshape=[1,49,49]

[autocendocer]
train_input_shape = [1, 49, 49]
test_input_shape = [1, 49, 49]
patchshape = [1, 49, 49]
What are the things that I should take into consideration?

I think you might also have to change [autoencoder].code_units (I just noticed that there are two code_units in the config, will fix that) For a patch size of 41 the spatial bottle neck of the autoencoder is 6*6 (2d) and 7 feature maps -> 6*6*7 = 252 You have to check in your log what the resulting bottle neck size is for 49 and then adapt the value of code_units

abred commented 3 years ago

2. Also, I looked into my dataset, the overlaps are pretty inconsistent throughout. There are some samples having big overlaps. I think using a bigger `pacth_size` would be better suited for my data. Wanted to know what would be the biggest `pacth_size` that can be used for ppp+dec architecture?
   For ed-ppp I guess a `patch_size ` of `81*81` is used in one of the experiments. While for ppp+dec the highest `patch_size` I came across was `49*49` is it possible to have a higher `patch_size` that this?

You have to experiment with that, larger patches might require a larger code code_units (but that also depends on the kind of data) which needs more memory. If the patches are too large they cover a large part of the image and the network might run into difficulties to properly learn them. But larger patches mean that you need fewer of them to cover the image.

Paragjain10 commented 3 years ago

The segmentation output is in the test/instanced/<checkpoint>/patch_threshold/fc_threshold/<sample> right?

Is this the correct place to see the segmentation? If yes than here is a sample of the segmentation result.

Paragjain10 commented 3 years ago

I think you might also have to change [autoencoder].code_units (I just noticed that there are two code_units in the config, will fix that) For a patch size of 41 the spatial bottle neck of the autoencoder is 6*6 (2d) and 7 feature maps -> 6*6*7 = 252 You have to check in your log what the resulting bottle neck size is for 49 and then adapt the value of code_units

I figured out that For patch_size=49*49 the code_units=343 which is 7*7*7. The experiment is running, I think this helped. Thanks for the help @abred.

About there being two code_units in the config file? What needs to changed?

abred commented 3 years ago

About there being two code_units in the config file? What needs to changed?

Don't worry about it, just set both to the same value, when I have time I will check if they are really both necessary.

The segmentation output is in the test/instanced/<checkpoint>/patch_threshold/fc_threshold/<sample> right?

Is this the correct place to see the segmentation? If yes than here is a sample of the segmentation result.

yes, exactly

Paragjain10 commented 3 years ago

Hello @abred,

I tried an experiment with patch_size =49*49, it ran correctly and I have achieved significant improvements as well. I trained the model for 100k iterations only and achieved the following results:

INFO:__main__:confusion_matrix.avAP TEST checkpoint 100000: 0.4229 ({'patch_threshold': 0.9, 'fc_threshold': 0.5})

The segmentation results look like this: Test\Instanced: Screenshot 2021-03-01 145323

Test\processe\pred_affs:

Screenshot 2021-03-01 150132

Are the segmentation results good enough for the value of AP?
What is the difference between the two test\process\pred_affs Zarr file and test\instanced\ HDF file results?

I am running another experiment with patch_size= 55*55; the training was successful but the code seems to have been stuck at this point:

INFO:__main__:Skipping decoding for 06_46. Already exists!
INFO:__main__:Skipping decoding for 06_36. Already exists!
INFO:__main__:Skipping decoding for 06_33. Already exists!
INFO:__main__:Skipping decoding for 06_29. Already exists!
INFO:__main__:Skipping decoding for 06_64. Already exists!
INFO:__main__:Skipping decoding for 06_90. Already exists!
INFO:__main__:Skipping decoding for 06_58. Already exists!
INFO:__main__:Skipping decoding for 06_12. Already exists!
INFO:__main__:vote_instances checkpoint 100000 {'patch_threshold': 0.5, 'fc_threshold': 0.5}
INFO:__main__:reading data from /home/student2/Desktop/Parag_masterthesis/experimentresults/wormbodies_setup08_210227_090926/val/processed/100000
['06_92', '06_80', '06_73', '06_8', '06_82', '06_86', '06_55', '06_60', '06_65', '06_77', '06_48', '06_23', '06_42', '06_31', '06_51', '06_24', '06_40', '06_3', '06_76', '06_17', '06_19', '06_68', '06_4', '06_21', '06_46', '06_36', '06_33', '06_29', '06_64', '06_90', '06_58', '06_12']
['06_92', '06_80', '06_73', '06_8', '06_82', '06_86', '06_55', '06_60', '06_65', '06_77', '06_48', '06_23', '06_42', '06_31', '06_51', '06_24', '06_40', '06_3', '06_76', '06_17', '06_19', '06_68', '06_4', '06_21', '06_46', '06_36', '06_33', '06_29', '06_64', '06_90', '06_58', '06_12']
['06_92', '06_80', '06_73', '06_8', '06_82', '06_86', '06_55', '06_60', '06_65', '06_77', '06_48', '06_23', '06_42', '06_31', '06_51', '06_24', '06_40', '06_3', '06_76', '06_17', '06_19', '06_68', '06_4', '06_21', '06_46', '06_36', '06_33', '06_29', '06_64', '06_90', '06_58', '06_12']
INFO:__main__:forking <function vote_instances_sample_seq at 0x7f73b5926200>
INFO:__main__:Skipping vote instances for 06_92. Already exists!
INFO:__main__:forking <function vote_instances_sample_seq at 0x7f73b5926200>
INFO:__main__:Skipping vote instances for 06_80. Already exists!
INFO:__main__:forking <function vote_instances_sample_seq at 0x7f73b5926200>
INFO:PatchPerPix.vote_instances.vote_instances:processing /home/student2/Desktop/Parag_masterthesis/experimentresults/wormbodies_setup08_210227_090926/val/processed/100000/06_73.zarr
INFO:PatchPerPix.vote_instances.utilVoteInstances:keys: ['volumes']
INFO:PatchPerPix.vote_instances.vote_instances:affinities shape: (3025, 1, 512, 512)
INFO:PatchPerPix.vote_instances.vote_instances:numinst_ shape: (1, 1, 512, 512)
INFO:PatchPerPix.vote_instances.vote_instances:input image shape: (1, 512, 512)
INFO:PatchPerPix.vote_instances.vote_instances:Number fg pixel: 46069
INFO:PatchPerPix.vote_instances.vote_instances:overlap mask: (1, 512, 512) 0
INFO:PatchPerPix.vote_instances.vote_instances:copy fg to fg2cover
INFO:PatchPerPix.vote_instances.vote_instances:Number overlapping pixel: 0
INFO:PatchPerPix.vote_instances.vote_instances:Number pixel fg_to_cover: 46069
INFO:PatchPerPix.vote_instances.vote_instances:num foreground pixels: 46069
INFO:PatchPerPix.vote_instances.vote_instances:num foreground pixels 42362
INFO:PatchPerPix.vote_instances.vote_instances:num foreground pixels excluding boundary region: 42362
INFO:PatchPerPix.vote_instances.utilVoteInstances:bg in rank: th/2
INFO:PatchPerPix.vote_instances.utilVoteInstances:accumulate normalized prob product [0,1] as aff
INFO:PatchPerPix.vote_instances.consensus_array:compute affs and cntr separately
INFO:PatchPerPix.vote_instances.consensus_array:consensus array shape (1, 110, 110, 1, 512, 512)
INFO:PatchPerPix.vote_instances.consensus_array:creating consensus array /home/student2/Desktop/Parag_masterthesis/experimentresults/wormbodies_setup08_210227_090926/val/processed/100000/06_73.zarr

I tried re-running the code as well but it still seems stuck, what could be the problem?

abred commented 3 years ago

1. Are the segmentation results good enough for the value of AP?

That's hard to judge, especially without ground truth segmentation and the raw image. Also, you should use a different colormap for the instance segmentation, if you have a continuous one it is sometimes hard to see if it is the exact same shade or slightly different.

2. What is the difference between the two `test\process\pred_affs` Zarr file and `test\instanced\` HDF file results?

In instanced is the instance segmentation result, after the whole pipeline; in processed is just the prediction of the neural network. And pred_affs should have patch_size*patch_size channels, not just one.

3. I tried re-running the code  as well but it still seems stuck, what could be the problem?

the larger the patch size and the more foreground pixels the longer it takes, but that step shouldn't take that long, how long did you wait?

Paragjain10 commented 3 years ago

That's hard to judge, especially without ground truth segmentation and the raw image. Also, you should use a different colormap for the instance segmentation, if you have a continuous one it is sometimes hard to see if it is the exact same shade or slightly different.

Okay, can you explain how do I implement a different colormap for my dataset?

the larger the patch size and the more foreground pixels the longer it takes, but that step shouldn't take that long, how long did you wait?

It was stuck at the same point for more than a day

abred commented 3 years ago

That's hard to judge, especially without ground truth segmentation and the raw image. Also, you should use a different colormap for the instance segmentation, if you have a continuous one it is sometimes hard to see if it is the exact same shade or slightly different.

Okay, can you explain how do I implement a different colormap for my dataset?

that's just visualization. Depends on the program you are using to look at it.

the larger the patch size and the more foreground pixels the longer it takes, but that step shouldn't take that long, how long did you wait?

It was stuck at the same point for more than a day

INFO:PatchPerPix.vote_instances.consensus_array:consensus array shape (1, 110, 110, 1, 512, 512)

that's about 12GB, I guess that is too large for your GPU. The code uses unified memory so you don't get an error, but it has to move a lot of data to and from the GPU all the time. I guess that makes it really slow. You could try using blockwise processing (e.g. have a look at the differences in[vote_instances] in here: https://github.com/Kainmueller-Lab/PatchPerPix_experiments/blob/master/nuclei3d/02_setups/setup01/config.toml)

Paragjain10 commented 3 years ago

Hello @abred ,

I am done training my model and now I want to fine-tune it, how can I use the weights of the trained model to fine-tune them? I tried training the same experiment with the dataset I want to fine-tune it, for some more iterations with smaller checkpoint sizes as the fine-tuning is being done only for 10% of the total training epochs. But it starts the training from the beginning as the checkpoints now is a smaller number than what it was for the full training.

Can you help me with this?

abred commented 3 years ago

Hi, I am not sure I understand exactly what you mean, but two code blocks that you might have to modify are here: https://github.com/Kainmueller-Lab/PatchPerPix_experiments/blob/master/wormbodies/02_setups/setup08/train.py#L21-L27 and here: https://github.com/Kainmueller-Lab/gunpowder/blob/master/gunpowder/tensorflow/nodes/train.py#L273-L297 It looks for the latest/highest checkpoint it can find in the respective folder. I guess you could either change your filenames accordingly or change the code to load a specific checkpoint. And depending on what exactly you want to do maybe change a few parameters (e.g. checkpoints/(max_)iterations)