Working with my own plant dataset

Kainmueller-Lab / PatchPerPix_experiments

experiments script for the PatchPerPix instance segmentation method

8 stars 3 forks source link

Working with my own plant dataset #4

Closed Paragjain10 closed 2 years ago

Paragjain10 commented 3 years ago

I am working with my own dataset. I am trying to use the considlate_data.py for the preprocessing of the data to get it in the correct format for the network. But I am facing a few problems, I am passing these parameters to run the file.

-i /home/student2/Desktop/Parag_masterthesis -o /home/student2/Desktop/Parag_masterthesis/newdata --raw-gfp-min 0 --raw-gfp-max 4095 --raw-bf-min 0 --raw-bf-max 3072 --out-format zarr --parallel 50

I am getting this error:

multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, *kwds)) File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 595, in call return self.func(args, **kwargs) File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/joblib/parallel.py", line 263, in call for func, args, kwargs in self.items] File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/joblib/parallel.py", line 263, in for func, args, kwargs in self.items] File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/wormbodies/01_data/consolidate_data.py", line 174, in work raw_bf = load_array(raw_fns[1]).astype(np.float32) IndexError: list index out of range """

abred commented 3 years ago

Hi,

you probably have to adapt the script (and the command line arguments). The wormbodies dataset has two different "versions" per image (brightfield (bf) and GFP). Yours probably does not have that.

Paragjain10 commented 3 years ago

I noticed the things you mentioned. This is some new information for me because the dataset I am working on has one RGB image and one ground truth image. Can you help me understand the significance of the two images raw-bf and raw-gfp. Also, what are the things that should be changed in consolidate.py to get my dataset preprocessed for training?

@abred

abred commented 3 years ago

raw-bf and raw-gfp are just the same sample recorded with different microscopy modes. Could look at it as two color channels, but stored in separate files. you can remove one of them and modify the other to accept 3 channel images instead of 1 channel images. As ground truth, the code expects label images, with one label per instance (not per class). I don't know how your ground truth image looks like, but you might have to adapt it.

Paragjain10 commented 3 years ago

If I make changes as you mentioned in the preprocessing of the data in such a way that only one image gives information of all three channels. Will the network accept all three channels from one image? Or the network will also have to be altered in such a manner that it accepts three channels from one image.
According to my understanding, the code uses the information of two channels from one image and one channel for the other. If this is the case would giving the same sample twice do the work for me?

abred commented 3 years ago

iirc, we only used the raw_bf image for training in the end. In the config file you can change the raw_key value to select which is used. If you have multi-channel data you can change num_channels in the config. That should, I hope, be all you'd have to change for training.

Paragjain10 commented 3 years ago

This is helpful, thank you. According to the discussion, what I have done is:

I have adapted my ground truths and created one ground truth image per instance.
For the raw images I created a copy of each sample and used them as bf and gfp respectively.

I was successful in running the consildate.py file and achieved its output. Is this the correct way of going about this? What is your opinion @abred ?

Paragjain10 commented 3 years ago

I tried starting the training with my processed dataset as I mentioned above, but I am facing this error. Can you help me with both my queries?

`ERROR:gunpowder.build:something went wrong during the setup of the pipeline, calling tear down Process Process-2: Traceback (most recent call last): File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/multiprocessing/process.py", line 99, in run self._target(*self._args, *self._kwargs) File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 110, in wrapper ret = func(args, kwargs) File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 385, in train config.get('preprocessing', {})) File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/wormbodies/02_setups/setup08/train.py", line 258, in train_until with gp.build(pipeline): File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/build.py", line 12, in enter self.batch_provider.setup() File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/batch_provider_tree.py", line 17, in setup self.rec_setup(self.output) File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/batch_provider_tree.py", line 70, in rec_setup self.rec_setup(upstream_provider) File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/batch_provider_tree.py", line 70, in __rec_setup self.rec_setup(upstream_provider) File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/batch_provider_tree.py", line 70, in rec_setup self.__rec_setup(upstream_provider) [Previous line repeated 8 more times] File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/batch_provider_tree.py", line 71, in rec_setup provider.setup() File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/batch_provider_tree.py", line 17, in setup self.rec_setup(self.output) File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/batch_provider_tree.py", line 70, in rec_setup self.rec_setup(upstream_provider) File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/batch_provider_tree.py", line 71, in rec_setup provider.setup() File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/batch_provider_tree.py", line 17, in setup self.rec_setup(self.output) File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/batch_provider_tree.py", line 71, in __rec_setup provider.setup() File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/nodes/random_location.py", line 94, in setup mask_batch = upstream.request_batch(mask_request) File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/nodes/batch_provider.py", line 146, in request_batch batch = self.provide(copy.deepcopy(request)) File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/nodes/batch_filter.py", line 134, in provide self.process(batch, request) File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/neurolight/gunpowder/count_overlap.py", line 137, in process other_label_mask = np.max(np.delete(array, c, axis=0), axis=0) > 0 File "<array_function__ internals>", line 6, in amax File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 2706, in amax keepdims=keepdims, initial=initial, where=where) File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 87, in _wrapreduction return ufunc.reduce(obj, axis, dtype, out, **passkwargs) ValueError: zero-size array to reduction operation maximum which has no identity Traceback (most recent call last): File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 1608, in main() File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 1428, in main train(args, config, train_folder) File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 126, in wrapper raise RuntimeError("child process died") RuntimeError: child process died

Process finished with exit code 1`

Paragjain10 commented 3 years ago

Hello @abred ,

Hope you’re doing well. Could you please see my above comments and help me out with them? Waiting for your response.

abred commented 3 years ago

Hi, was just working on it, sorry didn't get to it over the holidays. I'm guessing your data does not have multiple labels per pixel? If it doesn't, you should set overlapping_inst in the config to false, if you haven't already. However the flag was not always honored, I pushed a small update, hope that that was the only location.

Paragjain10 commented 3 years ago

Thank you @abred for your response.

Your suggestion solved the previous error. But I have encountered another error:


Process Process-1:
Traceback (most recent call last):
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 110, in wrapper
    ret = func(*args, **kwargs)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 352, in mknet
    debug=config['general']['debug'])
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/wormbodies/02_setups/setup08/mknet.py", line 116, in mk_net
    loss = tf.losses.sigmoid_cross_entropy(gt_affs, logitspatch)
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/ops/losses/losses_impl.py", line 700, in sigmoid_cross_entropy
    logits.get_shape().assert_is_compatible_with(multi_class_labels.get_shape())
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/framework/tensor_shape.py", line 1115, in assert_is_compatible_with
    **raise ValueError("Shapes %s and %s are incompatible" % (self, other))
ValueError: Shapes (252, 68, 68) and (1681, 68, 68) are incompatible**
Traceback (most recent call last):
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 1608, in <module>
    main()
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 1424, in main
    mknet(args, config, train_folder, test_folder)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 126, in wrapper
    raise RuntimeError("child process died")
RuntimeError: child process died

abred commented 3 years ago

I pushed an update, please also update the neurolight package (e.g. pip install -U "git+https://github.com/maisli/neurolight.git@master#egg=neurolight")

Paragjain10 commented 3 years ago

Yes, it works now. Training has started.

Also, I wanted to know whether the tensorboard is incorporated in the code? If not, can you tell me in which part of the code I should add the callbacks?

abred commented 3 years ago

You can have a look in the mknet.py, as it is now, there is one scalar summary created for the loss. So when you start tensorboard with --logdir pointed to the experiment folder you should see that. You can easily add more summaries here, for example by adding them to the list before calling tf.summary.merge. One example to get histogram summaries for the weights:

def add_summaries():
    summaries = []
    vars = tf.trainable_variables()
    for v in vars:
        summaries.append(tf.summary.histogram(v.name.replace(":", "_"), v))
    return summaries

Paragjain10 commented 3 years ago

@abred
The training was successfully completed.

Got this error after training:

Traceback (most recent call last): File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 1608, in main() File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 1480, in main output_folder) File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 743, in validate_checkpoints config['vote_instances'] File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 704, in get_postprocessing_params if config is None or config[p] == []: KeyError: 'patch_threshold'

Paragjain10 commented 3 years ago

@abred The training was successfully completed.

Got this error after training:

Traceback (most recent call last): File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 1608, in main() File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 1480, in main output_folder) File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 743, in validate_checkpoints config['vote_instances'] File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 704, in get_postprocessing_params if config is None or config[p] == []: KeyError: 'patch_threshold'

def get_postprocessing_params(config, params_list, test_config):
    params = {}
    for p in params_list:
        if config is None or config[p] == []:
            params[p] = [test_config[p]]
        else:
            params[p] = config[p]

    return params```

This is the part of the code where the error was being raised because it was searching for the key `patch_threshold' in config which didn't exist directly. Since the config[validation] was earlier like this:

[validation]
params=['patch_threshold', 'fc_threshold']

So, I changed the config[validation] as

[validation]
patch_threshold=[]
fc_threshold=[]

Would like to know if my understanding was correct, and the changes I have made are proper.

Paragjain10 commented 3 years ago

@abred This error is being thrown. The pred_numinst, pred_affs folder is missing in the val folder: /val/processed/20000/01_6.zarr/volumes


Process Process-203:
Traceback (most recent call last):
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 606, in decode
    **config['data']
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/wormbodies/02_setups/setup08/decode.py", line 135, in decode
    prediction = decode_sample(decoder, sample, **kwargs)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/wormbodies/02_setups/setup08/decode.py", line 43, in decode_sample
    pred_fg = np.array(zarr.open(sample, 'r')[kwargs['fg_key']])
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/zarr/hierarchy.py", line 349, in __getitem__
    raise KeyError(item)
KeyError: 'volumes/pred_numinst'
Traceback (most recent call last):
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 1609, in <module>
    main()
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 1481, in main
    output_folder)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 756, in validate_checkpoints
    output_folder)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 110, in wrapper
    ret = func(*args, **kwargs)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 679, in validate_checkpoint
    decode(args, config, data, autoencoder_chkpt, pred_folder, pred_folder)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 126, in wrapper
    raise RuntimeError("child process died")
RuntimeError: child process died

Paragjain10 commented 3 years ago

Hello @abred,

Could you please have a look at the previous comments?

Also, do you think the problem could be in the way I have given my data to the network? I have given the same raw(RGB) image twice in place of raw_bf and raw_gfp. To get the code running.

Waiting for your response.

abred commented 3 years ago

@abred The training was successfully completed. Got this error after training: Traceback (most recent call last): File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 1608, in main() File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 1480, in main output_folder) File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 743, in validate_checkpoints config['vote_instances'] File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 704, in get_postprocessing_params if config is None or config[p] == []: KeyError: 'patch_threshold'
def get_postprocessing_params(config, params_list, test_config):
    params = {}
    for p in params_list:
        if config is None or config[p] == []:
            params[p] = [test_config[p]]
        else:
            params[p] = config[p]

    return params```
This is the part of the code where the error was being raised because it was searching for the key `patch_threshold' in config which didn't exist directly. Since the config[validation] was earlier like this:
[validation]
params=['patch_threshold', 'fc_threshold']
So, I changed the config[validation] as
[validation]
patch_threshold=[]
fc_threshold=[]
Would like to know if my understanding was correct, and the changes I have made are proper.

something like

[validation]
params=['patch_threshold', 'fc_threshold']
patch_threshold=[0.5, 0.6, 0.7]
fc_threshold=[0.5, 0.6, 0.7]

would be better. params is a list of parameters used for hyperparameter optimization, and for each of these there is a list with possible values to try. At the moment the product (param_sets = list(named_product() of those lists is used. (You can also provide fixed combinations by changing named_product to named_zip, then all lists should have the same length.

Paragjain10 commented 3 years ago

Okay, I changed [validation] as you mentioned and I am running it again.

Also, I wanted to know that to preprocess my data to get it in the correct form what I did was:

I had only one raw (RGB image)
I fed the same image twice (one for raw_bf and one for raw_gfp) to get the consolidate_data.py running.

Do you think this way of feeding the data is correct? Would it have any impact on my results or could be the reasons for the errors?

abred commented 3 years ago

@abred This error is being thrown. The pred_numinst, pred_affs folder is missing in the val folder: /val/processed/20000/01_6.zarr/volumes

Process Process-203:
Traceback (most recent call last):
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 606, in decode
    **config['data']
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/wormbodies/02_setups/setup08/decode.py", line 135, in decode
    prediction = decode_sample(decoder, sample, **kwargs)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/wormbodies/02_setups/setup08/decode.py", line 43, in decode_sample
    pred_fg = np.array(zarr.open(sample, 'r')[kwargs['fg_key']])
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/zarr/hierarchy.py", line 349, in __getitem__
    raise KeyError(item)
KeyError: 'volumes/pred_numinst'
Traceback (most recent call last):
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 1609, in <module>
    main()
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 1481, in main
    output_folder)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 756, in validate_checkpoints
    output_folder)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 110, in wrapper
    ret = func(*args, **kwargs)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 679, in validate_checkpoint
    decode(args, config, data, autoencoder_chkpt, pred_folder, pred_folder)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 126, in wrapper
    raise RuntimeError("child process died")
RuntimeError: child process died

I pushed an update, unfortunately however you have to retrain :( If overlapping_inst is False a fg/bg mask has to be trained instead, this was missing. In the decode step only patches belonging to foreground pixels are decoded.

abred commented 3 years ago

Okay, I changed [validation] as you mentioned and I am running it again.

Also, I wanted to know that to preprocess my data to get it in the correct form what I did was:
1. I had only one raw (RGB image)

2. I fed the same image twice (one for raw_bf and one for raw_gfp) to get the `consolidate_data.py` running.
Do you think this way of feeding the data is correct? Would it have any impact on my results or could be the reasons for the errors?

Did you change num_channels to 3? The way the code is now only raw_bf is actually used (see raw_key in the config file). So feeding the same data twice in consolidate_data is redundant but not an issue.

Paragjain10 commented 3 years ago

@abred I made the changes that you pushed. Yes, when I change num_channels = 3, this error is thrown. When I run the code with num_channels=1 the training starts.


Traceback (most recent call last):
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 110, in wrapper
    ret = func(*args, **kwargs)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 385, in train
    **config.get('preprocessing', {}))
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/wormbodies/02_setups/setup08/train.py", line 281, in train_until
    pipeline.request_batch(request)
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/nodes/batch_provider.py", line 146, in request_batch
    batch = self.provide(copy.deepcopy(request))
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/batch_provider_tree.py", line 45, in provide
    return self.output.request_batch(request)
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/nodes/batch_provider.py", line 146, in request_batch
    batch = self.provide(copy.deepcopy(request))
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/nodes/batch_filter.py", line 128, in provide
    batch = self.get_upstream_provider().request_batch(upstream_request)
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/nodes/batch_provider.py", line 146, in request_batch
    batch = self.provide(copy.deepcopy(request))
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/nodes/batch_filter.py", line 128, in provide
    batch = self.get_upstream_provider().request_batch(upstream_request)
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/nodes/batch_provider.py", line 146, in request_batch
    batch = self.provide(copy.deepcopy(request))
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/nodes/batch_filter.py", line 134, in provide
    self.process(batch, request)
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/nodes/generic_train.py", line 151, in process
    self.train_step(batch, request)
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/gunpowder/tensorflow/nodes/train.py", line 278, in train_step
    feed_dict=inputs, options=run_options)
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1156, in _run
    (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
**_ValueError: Cannot feed value of shape (1, 256, 256) for Tensor 'raw:0', which has shape '(3, 256, 256)'_**
Traceback (most recent call last):
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 1609, in <module>
    main()
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 1429, in main
    train(args, config, train_folder)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 126, in wrapper
    raise RuntimeError("child process died")
RuntimeError: child process died

abred commented 3 years ago

Ah right, consolidate_data converts the data to grayscale, you probably want to disable that https://github.com/Kainmueller-Lab/PatchPerPix_experiments/blob/70ad81337ed85189312f421fc8c5a4df35b1a7ab/wormbodies/01_data/consolidate_data.py#L30-L33

Paragjain10 commented 3 years ago

Ah right, consolidate_data converts the data to grayscale, you probably want to disable that https://github.com/Kainmueller-Lab/PatchPerPix_experiments/blob/70ad81337ed85189312f421fc8c5a4df35b1a7ab/wormbodies/01_data/consolidate_data.py#L30-L33

@abred I tried running the code in two ways:

I commented the entire if loop you mentioned above.
Then i Commented only line 31 image = rgb2gray(image, np.iinfo(image.dtype).min, np.iinfo(image.dtype).max).

In both cases, the code is taking forever to compile successfully, and the output folder does not contain all the processed files of the dataset. Some files are missing.

Also, If I run the original consolidare.py without any changes letting the rgb2gray function stay it works correctly.

abred commented 3 years ago

Well, later in the code raw_gfp.shape is used, this is assumed to be (h,w) and now it is (c,h,w) so you have to fix that. (furthermore, the rest of the code assumes channels_first, so if you get (h,w,c) you also have to adapt that)

Paragjain10 commented 3 years ago

@abred I tried making the changes, but I have a few doubts regarding the code.

Firstly, according to what you have mentioned above,

Am I supposed to make the code compatible for raw_gfp.shape =(c,h,w) OR
I am supposed to only pass the (h,w) of raw_gfp wherever raw_gfp.shape is used.

Because , when I make the code compatible for raw_gfp=(c,h,w) and adapt the shape of all the images to (c,h,w) channels_first like the code assumes then the code does not function as expected. It seems to me that the code is not functioning correctly for 3 channel images. The code responds correctly when I pass my data with 1 channel only.

Secondly, why can't my data be processed in the same way as wormbodies data, where each image is being converted from rgb2gray before all the processing is done.

Paragjain10 commented 3 years ago

Hello @abred,

Hope I was able to explain my problem correctly in the above section. If anything is not clear, please let me know I will try explaining again.

abred commented 3 years ago

the wormbodies data is not rgb but already grayscale, so rgb2gray is not used. you can use rgb2gray, maybe it works. but as your data has color information, that might be useful for the network, and if you convert it to grayscale you might lose that. only the raw data has (optionally) color information, only the spatial dimensions (h,w) are needed for the other arrays.

Paragjain10 commented 3 years ago

Hello @abred,

I had a word with my supervisor and he said it isn’t a problem if we train the network with our dataset in grayscale. So accordingly, I moved forward with the implementation, consolidated the data, and started training the network. The training is complete but during the decode part this error is being thrown. I guess some folder is not getting created, can you help me with this:


Traceback (most recent call last):
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 1609, in <module>
    main()
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 1481, in main
    output_folder)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 756, in validate_checkpoints
    output_folder)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 110, in wrapper
    ret = func(*args, **kwargs)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 679, in validate_checkpoint
    decode(args, config, data, autoencoder_chkpt, pred_folder, pred_folder)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 126, in wrapper
    raise RuntimeError("child process died")
RuntimeError: child process died

Process finished with exit code 1

abred commented 3 years ago

Can you please send me the log output before this error message?

Paragjain10 commented 3 years ago

@abred The log output is very big. The part below is the log output before the error message.



INFO:tensorflow:Calling model_fn.
INFO:wormbodies.02_setups.setup08.decode:feature tensor: Tensor("IteratorGetNext:0", shape=(?, ?, 1, 252), dtype=float32)
INFO:wormbodies.02_setups.setup08.decode:label tensor: None
WARNING:tensorflow:From /home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/wormbodies/02_setups/setup08/decode.py:100: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

INFO:PatchPerPix.models.autoencoder:Tensor("Placeholder:0", shape=(?, 1, 41, 41), dtype=float32)
WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

WARNING:tensorflow:From /home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix/models/autoencoder.py:66: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.keras.layers.Conv2D` instead.
WARNING:tensorflow:From /home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/layers/convolutional.py:424: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
WARNING:tensorflow:From /home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix/models/autoencoder.py:83: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.MaxPooling2D instead.
WARNING:tensorflow:From /home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix/models/autoencoder.py:66: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.keras.layers.Conv2D` instead.
WARNING:tensorflow:From /home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/layers/convolutional.py:424: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
INFO:PatchPerPix.models.autoencoder:Tensor("encoder_layer_0_1/Relu:0", shape=(?, 32, 41, 41), dtype=float32)
WARNING:tensorflow:From /home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix/models/autoencoder.py:83: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.MaxPooling2D instead.
INFO:PatchPerPix.models.autoencoder:Tensor("downsample_0/MaxPool:0", shape=(?, 32, 21, 21), dtype=float32)
INFO:PatchPerPix.models.autoencoder:Tensor("encoder_layer_1_1/Relu:0", shape=(?, 48, 21, 21), dtype=float32)
INFO:PatchPerPix.models.autoencoder:Tensor("downsample_1/MaxPool:0", shape=(?, 48, 11, 11), dtype=float32)
INFO:PatchPerPix.models.autoencoder:Tensor("encoder_layer_2_1/Relu:0", shape=(?, 64, 11, 11), dtype=float32)
INFO:PatchPerPix.models.autoencoder:Tensor("downsample_2/MaxPool:0", shape=(?, 64, 6, 6), dtype=float32)
INFO:PatchPerPix.models.autoencoder:Tensor("to_code_layer_0/Sigmoid:0", shape=(?, 7, 6, 6), dtype=float32)
WARNING:tensorflow:From /home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix/models/autoencoder.py:302: flatten (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.flatten instead.
INFO:PatchPerPix.models.autoencoder:Tensor("code/Reshape:0", shape=(?, 252), dtype=float32)
INFO:PatchPerPix.models.autoencoder:Tensor("deflatten_out:0", shape=(?, 7, 6, 6), dtype=float32)
WARNING:tensorflow:From /home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix/models/autoencoder.py:302: flatten (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.flatten instead.
INFO:PatchPerPix.models.autoencoder:Tensor("from_code_layer_0/Relu:0", shape=(?, 64, 6, 6), dtype=float32)
INFO:PatchPerPix.models.autoencoder:Tensor("decoder_layer_0_1/Relu:0", shape=(?, 48, 12, 12), dtype=float32)
INFO:PatchPerPix.models.autoencoder:Tensor("decoder_layer_1_1/Relu:0", shape=(?, 32, 24, 24), dtype=float32)
INFO:PatchPerPix.models.autoencoder:Tensor("decoder_layer_2_1/BiasAdd:0", shape=(?, 1, 48, 48), dtype=float32)
INFO:PatchPerPix.models.autoencoder:Tensor("crop:0", shape=(?, 1, 41, 41), dtype=float32)
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
WARNING:tensorflow:From /home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/ops/array_ops.py:1475: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From /home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/ops/array_ops.py:1475: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Graph was finalized.
2021-01-12 05:33:08.535330: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2021-01-12 05:33:08.559129: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2799925000 Hz
2021-01-12 05:33:08.559895: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55d45177f170 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-01-12 05:33:08.559905: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-01-12 05:33:08.560446: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-01-12 05:33:08.564609: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-01-12 05:33:08.564902: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce RTX 2070 SUPER major: 7 minor: 5 memoryClockRate(GHz): 1.815
pciBusID: 0000:01:00.0
2021-01-12 05:33:08.565002: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2021-01-12 05:33:08.565560: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2021-01-12 05:33:08.566106: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2021-01-12 05:33:08.566238: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2021-01-12 05:33:08.566935: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2021-01-12 05:33:08.567436: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2021-01-12 05:33:08.568976: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-01-12 05:33:08.569026: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-01-12 05:33:08.569328: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-01-12 05:33:08.569594: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2021-01-12 05:33:08.569616: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2021-01-12 05:33:08.612099: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-01-12 05:33:08.612114: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
2021-01-12 05:33:08.612118: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 
2021-01-12 05:33:08.612221: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-01-12 05:33:08.612533: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-01-12 05:33:08.612825: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-01-12 05:33:08.613103: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7333 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070 SUPER, pci bus id: 0000:01:00.0, compute capability: 7.5)
2021-01-12 05:33:08.614136: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55d44e137840 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-01-12 05:33:08.614144: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce RTX 2070 SUPER, Compute Capability 7.5
INFO:tensorflow:Restoring parameters from ~/home/student2/Desktop/Parag_masterthesis/~/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/train/train_net_checkpoint_20000
INFO:tensorflow:Restoring parameters from ~/home/student2/Desktop/Parag_masterthesis/~/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/train/train_net_checkpoint_20000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
2021-01-12 05:33:08.953489: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-01-12 05:33:09.562172: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)
in decode sample:  (1, 252)

abred commented 3 years ago

hmm strange, it looks fine. Could you please check the contents of val/processed (or test/processed), there should be a folder for the checkpoint you are testing and then zarr directories for you samples and in the zarr volumes/pred_code and volumes/pred_affs and they shouldn't be empty. But the decode step worked for you for the worm data, didn't it? Did you make any changes? Could you please add a try-except block around the call to decode_fn in run_ppp? Maybe the fork is hiding some error.

Paragjain10 commented 3 years ago

@abred Yes, I checked the val/processed folder has the checkpoint and the zarr directories. It also has volume/pred_code folder but the volume/pred_affs is missing. Yes, the decode step worked for me for the worm data. About the changes, you had pushed some changes a few days ago, those are the only changes I have made. Initially only the volume/pred_code was generated after the changes you pushed volume/pred_code and volume/pred_numist are getting generated but volume/pred_affs is missing.

abred commented 3 years ago

Have you tried this?

Could you please add a try-except block around the call to decode_fn in run_ppp? Maybe the fork is hiding some error.

(And print the exception if there is one) pred_affs is supposed to be generated by the decode step

Paragjain10 commented 3 years ago

@abred The try-except block is throwing this error:


Traceback (most recent call last):
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 828, in vote_instances_sample_seq
    sample)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 893, in vote_instances_sample
    fg_key=config['prediction'].get('fg_key'),
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix/vote_instances/vote_instances.py", line 595, in main
    do_all(affinities, **args)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix/vote_instances/vote_instances.py", line 519, in do_all
    patchshape=patchshape, **kwargs)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix/vote_instances/utilVoteInstances.py", line 153, in loadAffinities
    shape = f[aff_key].shape
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/zarr/hierarchy.py", line 349, in __getitem__
    raise KeyError(item)
KeyError: 'volumes/pred_affs'
child process died

abred commented 3 years ago

File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix/vote_instances/vote_instances.py", line 595, in main

that the next step, that one of course fails until the previous step finished.

Where did you put the try-except block? It should be around decode_fn.

Paragjain10 commented 3 years ago

@abred


Traceback (most recent call last):
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
  (0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[{{node from_code_layer_0/Conv2D}}]]
  (1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[{{node from_code_layer_0/Conv2D}}]]
     [[affinities/_177]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 606, in decode
    **config['data']
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/wormbodies/02_setups/setup08/decode.py", line 135, in decode
    prediction = decode_sample(decoder, sample, **kwargs)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/wormbodies/02_setups/setup08/decode.py", line 78, in decode_sample
    predictions = decoder.predict(pred_code_batched)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix/models/fast_predict.py", line 44, in predict
    results.append(next(self.predictions))
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 640, in predict
    preds_evaluated = mon_sess.run(predictions)
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 754, in run
    run_metadata=run_metadata)
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 1259, in run
    run_metadata=run_metadata)
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 1360, in run
    raise six.reraise(*original_exc_info)
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/six.py", line 703, in reraise
    raise value
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 1345, in run
    return self._sess.run(*args, **kwargs)
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 1418, in run
    run_metadata=run_metadata)
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 1176, in run
    return self._sess.run(*args, **kwargs)
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
  (0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[node from_code_layer_0/Conv2D (defined at /anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
  (1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[node from_code_layer_0/Conv2D (defined at /anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
     [[affinities/_177]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'from_code_layer_0/Conv2D':
  File "/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 1616, in <module>
    main()
  File "/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 1488, in main
    output_folder)
  File "/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 759, in validate_checkpoints
    output_folder)
  File "/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 110, in wrapper
    ret = func(*args, **kwargs)
  File "/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 680, in validate_checkpoint
    decode(args, config, data, autoencoder_chkpt, pred_folder, pred_folder)
  File "/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 123, in wrapper
    p.start()
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/multiprocessing/process.py", line 112, in start
    self._popen = self._Popen(self)
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/multiprocessing/context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/multiprocessing/context.py", line 277, in _Popen
    return Popen(process_obj)
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
    self._launch(process_obj)
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/multiprocessing/popen_fork.py", line 74, in _launch
    code = process_obj._bootstrap()
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 606, in decode
    **config['data']
  File "/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/wormbodies/02_setups/setup08/decode.py", line 135, in decode
    prediction = decode_sample(decoder, sample, **kwargs)
  File "/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/wormbodies/02_setups/setup08/decode.py", line 78, in decode_sample
    predictions = decoder.predict(pred_code_batched)
  File "/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix/models/fast_predict.py", line 44, in predict
    results.append(next(self.predictions))
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 622, in predict
    features, None, ModeKeys.PREDICT, self.config)
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1149, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/wormbodies/02_setups/setup08/decode.py", line 110, in decoder_model_fn
    **ae_config
  File "/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix/models/autoencoder.py", line 318, in autoencoder
    name='from_code_layer')
  File "/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix/models/autoencoder.py", line 66, in conv_pass
    name=name + '_%i' % i)
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 324, in new_func
    return func(*args, **kwargs)
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/layers/convolutional.py", line 424, in conv2d
    return layer.apply(inputs)
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 324, in new_func
    return func(*args, **kwargs)
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 1700, in apply
    return self.__call__(inputs, *args, **kwargs)
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/layers/base.py", line 548, in __call__
    outputs = super(Layer, self).__call__(inputs, *args, **kwargs)
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 854, in __call__
    outputs = call_fn(cast_inputs, *args, **kwargs)
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/autograph/impl/api.py", line 234, in wrapper
    return converted_call(f, options, args, kwargs)
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/autograph/impl/api.py", line 439, in converted_call
    return _call_unconverted(f, args, kwargs, options)
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/autograph/impl/api.py", line 330, in _call_unconverted
    return f(*args, **kwargs)
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/keras/layers/convolutional.py", line 197, in call
    outputs = self._convolution_op(inputs, self.kernel)
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/ops/nn_ops.py", line 1134, in __call__
    return self.conv_op(inp, filter)
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/ops/nn_ops.py", line 639, in __call__
    return self.call(inp, filter)
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/ops/nn_ops.py", line 238, in __call__
    name=self.name)
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/ops/nn_ops.py", line 2010, in conv2d
    name=name)
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_nn_ops.py", line 1071, in conv2d
    data_format=data_format, dilations=dilations, name=name)
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "/anaconda3/envs/Parag_GreenAI/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

abred commented 3 years ago

That's a CUDA error, indicating that something's wrong with the GPU, unrelated to the ppp code. Does nvidia-smi work? If not maybe there was a driver update, restart helps. Or you can try some small example, there should be some cuda error messages. $ python

import tensorflow as tf s = tf.Session() b = tf.add(1,1) s.run(b)

Paragjain10 commented 3 years ago

@abred This is the try-except block that I am trying:

 try:
      decode(args, config, data, autoencoder_chkpt, pred_folder, pred_folder)
 except Exception as e:
      print("unknown error")
      print(e)

The code seems to throw nothing but just child process died statement. And proceeds further and the next error is raised. Is this the correct way of doing it or there is something else that can be done?


unknown error
child process died
INFO:__main__:vote_instances checkpoint 20000 {'patch_threshold': 0.5, 'fc_threshold': 0.5}
INFO:__main__:reading data from ~/home/student2/Desktop/Parag_masterthesis/~/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/val/processed/20000
['01_23', '02_56', '10_1134', '05_74', '10_1124', '07_45', '01_11', '03_461', '08_469', '05_60', '02_3', '05_39', '02_17', '03_492', '10_1138', '10_1090', '04_1013', '10_1100', '03_437', '02_6', '06_51', '04_946', '09_747', '05_85', '07_92', '01_84', '10_1060', '09_753', '08_412', '08_421', '03_458', '07_58', '06_24', '04_979', '03_507', '02_14', '10_1107', '03_452', '03_531', '06_40', '01_25', '10_1135', '02_86', '01_73', '09_748', '03_475', '05_62', '08_491', '04_1019', '03_455', '06_3', '02_94', '09_726', '02_36', '03_477', '02_22', '06_76', '05_33', '03_528', '03_466', '02_90', '06_17', '03_502', '01_42', '10_1069', '03_471', '08_497', '09_768', '05_11', '08_407', '07_81', '01_74', '08_484', '01_29', '06_19', '03_467', '04_967', '07_51', '04_1031', '09_777', '08_423', '05_79', '06_68', '10_1067', '01_62', '07_42', '02_85', '07_29', '02_100', '07_85', '04_1018', '02_82', '06_4', '04_955', '02_24', '03_499', '07_13', '02_97', '01_14', '09_728', '04_1001', '03_509', '06_21', '07_63', '05_50', '04_1007', '04_1012', '04_1004', '01_82', '06_46', '10_1147', '02_50', '07_64', '04_940', '07_23', '08_404', '08_418', '04_958', '02_98', '04_1037', '02_48', '04_1033', '03_470', '04_999', '01_43', '09_735', '01_46', '05_87', '06_36', '10_1140', '05_56', '07_77', '03_515', '01_49', '01_59', '06_33', '03_446', '07_36', '06_29', '03_485', '04_1030', '06_64', '01_86', '08_415', '06_90', '01_68', '01_39', '09_756', '04_948', '01_28', '02_75', '09_779', '10_1114', '03_496', '03_505', '03_474', '09_775', '02_20', '07_33', '06_58', '10_1142', '01_63', '01_81', '05_25', '10_1076', '02_29', '04_938', '10_1080', '08_451', '05_7', '04_1005', '04_951', '04_1026', '03_519', '09_793', '06_12', '02_47', '10_1102', '09_785', '08_461', '01_6', '01_88', '08_496', '04_1028', '10_1104', '10_1133', '08_459', '07_41', '04_1032', '07_12', '10_1071', '07_54', '01_15', '02_15', '09_732', '02_2', '04_1006', '07_68', '07_18', '10_1129']
['01_23', '02_56', '10_1134', '05_74', '10_1124', '07_45', '01_11', '03_461', '08_469', '05_60', '02_3', '05_39', '02_17', '03_492', '10_1138', '10_1090', '04_1013', '10_1100', '03_437', '02_6', '06_51', '04_946', '09_747', '05_85', '07_92', '01_84', '10_1060', '09_753', '08_412', '08_421', '03_458', '07_58', '06_24', '04_979', '03_507', '02_14', '10_1107', '03_452', '03_531', '06_40', '01_25', '10_1135', '02_86', '01_73', '09_748', '03_475', '05_62', '08_491', '04_1019', '03_455', '06_3', '02_94', '09_726', '02_36', '03_477', '02_22', '06_76', '05_33', '03_528', '03_466', '02_90', '06_17', '03_502', '01_42', '10_1069', '03_471', '08_497', '09_768', '05_11', '08_407', '07_81', '01_74', '08_484', '01_29', '06_19', '03_467', '04_967', '07_51', '04_1031', '09_777', '08_423', '05_79', '06_68', '10_1067', '01_62', '07_42', '02_85', '07_29', '02_100', '07_85', '04_1018', '02_82', '06_4', '04_955', '02_24', '03_499', '07_13', '02_97', '01_14', '09_728', '04_1001', '03_509', '06_21', '07_63', '05_50', '04_1007', '04_1012', '04_1004', '01_82', '06_46', '10_1147', '02_50', '07_64', '04_940', '07_23', '08_404', '08_418', '04_958', '02_98', '04_1037', '02_48', '04_1033', '03_470', '04_999', '01_43', '09_735', '01_46', '05_87', '06_36', '10_1140', '05_56', '07_77', '03_515', '01_49', '01_59', '06_33', '03_446', '07_36', '06_29', '03_485', '04_1030', '06_64', '01_86', '08_415', '06_90', '01_68', '01_39', '09_756', '04_948', '01_28', '02_75', '09_779', '10_1114', '03_496', '03_505', '03_474', '09_775', '02_20', '07_33', '06_58', '10_1142', '01_63', '01_81', '05_25', '10_1076', '02_29', '04_938', '10_1080', '08_451', '05_7', '04_1005', '04_951', '04_1026', '03_519', '09_793', '06_12', '02_47', '10_1102', '09_785', '08_461', '01_6', '01_88', '08_496', '04_1028', '10_1104', '10_1133', '08_459', '07_41', '04_1032', '07_12', '10_1071', '07_54', '01_15', '02_15', '09_732', '02_2', '04_1006', '07_68', '07_18', '10_1129']
['01_23', '02_56', '10_1134', '05_74', '10_1124', '07_45', '01_11', '03_461', '08_469', '05_60', '02_3', '05_39', '02_17', '03_492', '10_1138', '10_1090', '04_1013', '10_1100', '03_437', '02_6', '06_51', '04_946', '09_747', '05_85', '07_92', '01_84', '10_1060', '09_753', '08_412', '08_421', '03_458', '07_58', '06_24', '04_979', '03_507', '02_14', '10_1107', '03_452', '03_531', '06_40', '01_25', '10_1135', '02_86', '01_73', '09_748', '03_475', '05_62', '08_491', '04_1019', '03_455', '06_3', '02_94', '09_726', '02_36', '03_477', '02_22', '06_76', '05_33', '03_528', '03_466', '02_90', '06_17', '03_502', '01_42', '10_1069', '03_471', '08_497', '09_768', '05_11', '08_407', '07_81', '01_74', '08_484', '01_29', '06_19', '03_467', '04_967', '07_51', '04_1031', '09_777', '08_423', '05_79', '06_68', '10_1067', '01_62', '07_42', '02_85', '07_29', '02_100', '07_85', '04_1018', '02_82', '06_4', '04_955', '02_24', '03_499', '07_13', '02_97', '01_14', '09_728', '04_1001', '03_509', '06_21', '07_63', '05_50', '04_1007', '04_1012', '04_1004', '01_82', '06_46', '10_1147', '02_50', '07_64', '04_940', '07_23', '08_404', '08_418', '04_958', '02_98', '04_1037', '02_48', '04_1033', '03_470', '04_999', '01_43', '09_735', '01_46', '05_87', '06_36', '10_1140', '05_56', '07_77', '03_515', '01_49', '01_59', '06_33', '03_446', '07_36', '06_29', '03_485', '04_1030', '06_64', '01_86', '08_415', '06_90', '01_68', '01_39', '09_756', '04_948', '01_28', '02_75', '09_779', '10_1114', '03_496', '03_505', '03_474', '09_775', '02_20', '07_33', '06_58', '10_1142', '01_63', '01_81', '05_25', '10_1076', '02_29', '04_938', '10_1080', '08_451', '05_7', '04_1005', '04_951', '04_1026', '03_519', '09_793', '06_12', '02_47', '10_1102', '09_785', '08_461', '01_6', '01_88', '08_496', '04_1028', '10_1104', '10_1133', '08_459', '07_41', '04_1032', '07_12', '10_1071', '07_54', '01_15', '02_15', '09_732', '02_2', '04_1006', '07_68', '07_18', '10_1129']
INFO:__main__:forking <function vote_instances_sample_seq at 0x7fbbb31be9e0>
INFO:PatchPerPix.vote_instances.vote_instances:processing ~/home/student2/Desktop/Parag_masterthesis/~/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/val/processed/20000/01_23.zarr
INFO:PatchPerPix.vote_instances.utilVoteInstances:keys: ['volumes']
Traceback (most recent call last):
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 1616, in <module>
    main()
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 1488, in main
    output_folder)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 760, in validate_checkpoints
    output_folder)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 110, in wrapper
    ret = func(*args, **kwargs)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 692, in validate_checkpoint
    vote_instances(args, config, data, pred_folder, inst_folder)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 110, in wrapper
    ret = func(*args, **kwargs)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 816, in vote_instances
    output_folder, sample)
File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 126, in wrapper
    raise RuntimeError("child process died")
RuntimeError: child process died

abred commented 3 years ago

Hi, no you have to put the try-except block inside the decode function and there around the call to decode_fn, or around everything in the decode function (but still in run_ppp.py), but it has to be inside as a new process is started/forked when this function is called, if you do it outside you only get the generic child process died error.

(the next error happens because you don't re-raise/throw the exception, so execution continues as if there hadn't been an exception

except Exception as e:
      print("unknown error")
      print(e)
      raise e

Paragjain10 commented 3 years ago

@abred I am not sure whether my understanding is correct or no. I tried two ways of putting the try-except block: 1.

def decode(args, config, data, checkpoint, pred_folder, output_folder):
    try:
        in_format = config['prediction']['output_format']
        samples = get_list_samples(config, pred_folder, in_format, data)

        if args.sample is not None:
            samples = [s for s in samples if args.sample in s]

        to_be_skipped = []
        for sample in samples:
            pred_file = os.path.join(output_folder, sample + '.' + in_format)
            if not config['general']['overwrite'] and os.path.exists(pred_file):
                if check_file(pred_file, remove_on_error=False,
                              key=config['prediction'].get('aff_key',
                                                           "volumes/pred_affs")):
                    logger.info('Skipping decoding for %s. Already exists!', sample)
                    to_be_skipped.append(sample)
        for sample in to_be_skipped:
            samples.remove(sample)
        if len(samples) == 0:
            return

        if 'CUDA_VISIBLE_DEVICES' not in os.environ:
            raise RuntimeError("no free GPU available!")
        import tensorflow as tf
        for idx, s in enumerate(samples):
            samples[idx] = os.path.join(pred_folder, s + "." + in_format)

        if args.run_from_exp:
            decode_fn = runpy.run_path(
                os.path.join(config['base'], 'decode.py'))['decode']
        else:
            decode_fn = importlib.import_module(
                args.app + '.02_setups.' + args.setup + '.decode').decode

        if config['model'].get('code_units'):
            input_shape = (config['model'].get('code_units'),)
        else:
            input_shape = None
        try:

            decode_fn(
                mode=tf.estimator.ModeKeys.PREDICT,
                input_shape=input_shape,
                checkpoint_file=checkpoint,
                output_folder=output_folder,
                samples=samples,
                included_ae_config=config.get('autoencoder'),
                **config['model'],
                **config['prediction'],
                **config['visualize'],
                **config['data']
            )
        except Exception as err:
            print(err)
            raise (err)

    except Exception as e:
        print(e)
        raise(e)

def decode(args, config, data, checkpoint, pred_folder, output_folder):
    try:
        in_format = config['prediction']['output_format']
        samples = get_list_samples(config, pred_folder, in_format, data)

        if args.sample is not None:
            samples = [s for s in samples if args.sample in s]

        to_be_skipped = []
        for sample in samples:
            pred_file = os.path.join(output_folder, sample + '.' + in_format)
            if not config['general']['overwrite'] and os.path.exists(pred_file):
                if check_file(pred_file, remove_on_error=False,
                              key=config['prediction'].get('aff_key',
                                                           "volumes/pred_affs")):
                    logger.info('Skipping decoding for %s. Already exists!', sample)
                    to_be_skipped.append(sample)
        for sample in to_be_skipped:
            samples.remove(sample)
        if len(samples) == 0:
            return

        if 'CUDA_VISIBLE_DEVICES' not in os.environ:
            raise RuntimeError("no free GPU available!")
        import tensorflow as tf
        for idx, s in enumerate(samples):
            samples[idx] = os.path.join(pred_folder, s + "." + in_format)

        if args.run_from_exp:
            decode_fn = runpy.run_path(
                os.path.join(config['base'], 'decode.py'))['decode']
        else:
            decode_fn = importlib.import_module(
                args.app + '.02_setups.' + args.setup + '.decode').decode

        if config['model'].get('code_units'):
            input_shape = (config['model'].get('code_units'),)
        else:
            input_shape = None

         decode_fn(
              mode=tf.estimator.ModeKeys.PREDICT,
              input_shape=input_shape,
              checkpoint_file=checkpoint,
              output_folder=output_folder,
              samples=samples,
              included_ae_config=config.get('autoencoder'),
              **config['model'],
              **config['prediction'],
              **config['visualize'],
              **config['data']
         )

    except Exception as e:
        print(e)
        raise(e)

Neither of the above raised an error message. If both these ways are not correct, could you please make the changes and post them here for reference. Sorry for the inconvenience.

abred commented 3 years ago

not really sure what the problem is, could you please print the exit code? Maybe that helps

def fork(func):
  ...
  if p.exitcode != 0:
    raise RuntimeError("child process died")
  ...

Paragjain10 commented 3 years ago

This is what the exit code is :

def  fork(func):
  ...
  if p.exitcode != 0:
      print("exitcode:", p.exitcode)
      raise RuntimeError("child process died")
  ...


exitcode: -9
Traceback (most recent call last):
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 1623, in <module>
    main()
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 1495, in main
    output_folder)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 767, in validate_checkpoints
    output_folder)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 110, in wrapper
    ret = func(*args, **kwargs)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 689, in validate_checkpoint
    decode(args, config, data, autoencoder_chkpt, pred_folder, pred_folder)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 127, in wrapper
    raise RuntimeError("child process died")
RuntimeError: child process died

abred commented 3 years ago

Well, that's at least something, there you have a starting point. What does an exit code of -9 for the python multiprocessing module mean? I don't know, the linux signal number for SIGKILL is 9, maybe that's related.

Paragjain10 commented 3 years ago

Thank you for the help @abred,

Yes, it was an error due to the OS killing the processes. I tried finding out a reason for it and I have landed with this as an answer:

(Parag_GreenAI) student2@BQ-DX1100-CT2:~/Desktop/Parag_masterthesis/PatchPerPix$ dmesg | egrep -i 'killed process'
[3690317.795063] Out of memory: Killed process 25126 (python) total-vm:68878336kB, anon-rss:40438604kB, file-rss:73916kB, shmem-rss:10240kB, UID:1003 pgtables:79984kB oom_score_adj:0
[3692545.821061] Out of memory: Killed process 26876 (python) total-vm:84425016kB, anon-rss:41435308kB, file-rss:69168kB, shmem-rss:30720kB, UID:1003 pgtables:97712kB oom_score_adj:0
[3692958.012000] Out of memory: Killed process 27056 (python) total-vm:84436940kB, anon-rss:41452632kB, file-rss:67504kB, shmem-rss:30720kB, UID:1003 pgtables:97684kB oom_score_adj:0
[3698429.031248] Out of memory: Killed process 29824 (python) total-vm:84408820kB, anon-rss:41401664kB, file-rss:69148kB, shmem-rss:30720kB, UID:1003 pgtables:104396kB oom_score_adj:0
[3698788.030913] Out of memory: Killed process 30003 (python) total-vm:84427092kB, anon-rss:41397448kB, file-rss:70440kB, shmem-rss:30684kB, UID:1003 pgtables:104416kB oom_score_adj:0
[3699090.490263] Out of memory: Killed process 30156 (python) total-vm:84423948kB, anon-rss:41382972kB, file-rss:67160kB, shmem-rss:30720kB, UID:1003 pgtables:104432kB oom_score_adj:0

I think the process needs more RAM than available, the OS has a hit man, oom-killer, that kills such processes for the sake of system stability. Do you think this could be the reason? Can you help me with what could be changed so that process does not require all the RAM available and keeps some for the system processes. Changing which params could be helpful in this case? Note: The training is finished and it's on the decoding part.

abred commented 3 years ago

ok, yes, because the worm images are quite small and I had enough RAM I did it in one step. For larger (or 3d) data we had a similar issue

You can try replacing decode.py with the version below. It was originally for 3d data, so there might be a few shapes etc you have to change (I already fixed a few) It slices the image along the x axis and computes one slice at a time (chunkSzX, depending on the amount of RAM you have you can also try larger values)

import time
import logging
try:
    import absl.logging
    logging.root.removeHandler(absl.logging._absl_handler)
    absl.logging._warn_preinit_stderr = False
except Exception as e:
    print(e)
import numpy as np
import tensorflow as tf
import h5py
import zarr
import os
import toml

from PatchPerPix.models import autoencoder, FastPredict
from PatchPerPix.visualize import visualize_patches

logger = logging.getLogger(__name__)

def predict_input_fn(generator, input_shape):
    def _inner_input_fn():
        dataset = tf.data.Dataset.from_generator(
            generator,
            output_types=tf.float32,
            output_shapes=(tf.TensorShape(input_shape))).batch(1)
        return dataset

    return _inner_input_fn

def decode_sample(decoder, sample, **kwargs):
    batch_size = kwargs['decode_batch_size']
    code_units = kwargs['code_units']
    patchshape = kwargs['patchshape']
    if type(patchshape) != np.ndarray:
        patchshape = np.array(patchshape)
    patchshape = patchshape[patchshape > 1]

    # load data depending on prediction.output_format and prediction.aff_key
    if "zarr" in kwargs['output_format']:
        pred_code = np.array(zarr.open(sample, 'r')[kwargs['code_key']])
        pred_fg = np.array(zarr.open(sample, 'r')[kwargs['fg_key']])
    elif "hdf" in kwargs['output_format']:
        with h5py.File(sample, 'r') as f:
            pred_code = np.array(f[kwargs['code_key']])
            pred_fg = np.array(f[kwargs['fg_key']])
    else:
        raise NotImplementedError("invalid input format")

    # check if fg is numinst with one channel per number instances [0,1,..]
    # heads up: assuming probabilities for numinst [0, 1, 2] in this order!
    if pred_fg.shape[0] > 1:
        pred_fg = np.any(np.array([
            pred_fg[i] >= kwargs['fg_thresh']
            for i in range(1, pred_fg.shape[0])
        ]), axis=0).astype(np.uint8)
    else:
        pred_fg = (pred_fg >= kwargs['fg_thresh']).astype(np.uint8)

    pred_fg = np.squeeze(pred_fg)
    fg_coords = np.transpose(np.nonzero(pred_fg))
    num_batches = int(np.ceil(fg_coords.shape[0] / float(batch_size)))
    logger.info("processing %i batches", num_batches)

 #   output = np.zeros((np.prod(patchshape),) + pred_fg.shape)
    sample_name = os.path.basename(sample).split('.')[0]
    outfn = os.path.join(kwargs['output_folder'],
                         sample_name + '.' + kwargs['output_format'])
    mode = 'a' if os.path.exists(outfn) else 'w'
    if kwargs['output_format'] == 'zarr':
        outf = zarr.open(outfn, mode=mode)
    elif kwargs['output_format'] == 'hdf':
        outf = h5py.File(outfn, mode)
    else:
        raise NotImplementedError
    chunkSzX = 10
    chunkSz = (int(np.prod(patchshape)),) + (pred_fg.shape[0], chunkSzX)
    data = outf.create_dataset(
        kwargs['aff_key'],
        shape=(np.prod(patchshape),) + pred_fg.shape,
        dtype=np.float32,
        chunks=chunkSz,
        compression='gzip')
    print(data.chunks)
    # exit()

    fg_coords_sorted = {}
    for c in fg_coords:
        fg_coords_sorted.setdefault(c[-1]//chunkSzX, []).append(c)
    print(fg_coords_sorted.keys())
    for x_slice, fg_coords in fg_coords_sorted.items():
        if (x_slice+1)*chunkSzX > pred_fg.shape[-1]:
            sz = pred_fg.shape[-1] - x_slice*chunkSzX
        else:
            sz = chunkSzX
        data_tmp = np.zeros((int(np.prod(patchshape)),) + (pred_fg.shape[0], sz),
                            dtype=np.float32)
        for b in range(0, len(fg_coords), batch_size):
            # print("new it")
            # start = time.time()
            fg_coords_batched = fg_coords[b:b + batch_size]
            fg_coords_batched = [(slice(None),) + tuple(
                [slice(i, i + 1) for i in fg_coord])
                                 for fg_coord in fg_coords_batched]
            pred_code_batched = [pred_code[fg_coord].reshape((1, code_units))
                                 for fg_coord in fg_coords_batched]
            if len(pred_code_batched) < batch_size:
                pred_code_batched = pred_code_batched + ([np.zeros(
                    (1, code_units))] * (batch_size - len(pred_code_batched)))
            # print(time.time() - start)
            # start = time.time()
            logger.info('in decode sample: {} ({}/{}, slice: {})'.format(
                pred_code_batched[0].shape,
                b, len(fg_coords), x_slice))
            predictions = decoder.predict(pred_code_batched)
            # print(time.time() - start)
            # start = time.time()
            # print("predict done")
            for idx, fg_coord in enumerate(fg_coords_batched):
                prediction = predictions[idx]
                # print(time.time() - start)
                # start = time.time()
                # print("id", idx, fg_coord, prediction['affinities'].shape)
                x = fg_coords[b+idx][-1] % chunkSzX
                # x = fg_coord[3].start % % chunkSzX
                data_tmp[fg_coord[0], fg_coord[1], fg_coord[2], x] = \
                    np.reshape(
                        prediction['affinities'],
                        (np.prod(prediction['affinities'].shape), 1, 1)
                    )
                # data[fg_coord] = np.reshape(
                #     prediction['affinities'],
                #     (np.prod(prediction['affinities'].shape), 1, 1, 1)
                # )
                # print(time.time() - start)
                # start = time.time()
        st = x_slice * chunkSzX
        nd = min((x_slice+1)*chunkSzX, pred_fg.shape[-1])
        data[:,:,st:nd] = data_tmp

    if kwargs['output_format'] == 'hdf':
        outf.close()

#    return output               # 

def decoder_model_fn(features, labels, mode, params):
    if mode != tf.estimator.ModeKeys.PREDICT:
        raise RuntimeError("invalid tf estimator mode %s", mode)

    logger.info("feature tensor: %s", features)
    logger.info("label tensor: %s", labels)

    ae_config = params['included_ae_config']

    is_training = False
    code = tf.reshape(features, (-1,) + params['input_shape'])
    dummy_in = tf.placeholder(
        tf.float32, [None, ] + ae_config['patchshape'])
    input_shape = tuple(p for p in ae_config['patchshape']
                        if p > 1)
    logits, _, _ = autoencoder(
        code,
        is_training=is_training,
        input_shape_squeezed=input_shape,
        only_decode=True,
        dummy_in=dummy_in,
        **ae_config
    )
    pred_affs = tf.sigmoid(logits, name="affinities")
    predictions = {
        "affinities": pred_affs,
    }
    return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)

def decode(**kwargs):
    sess_config = tf.ConfigProto()
    sess_config.gpu_options.allow_growth = True
    config = tf.estimator.RunConfig(
        model_dir=kwargs['output_folder'],
        session_config=sess_config)

    decoder = tf.estimator.Estimator(model_fn=decoder_model_fn,
                                     params=kwargs, config=config)

    if kwargs['mode'] == tf.estimator.ModeKeys.PREDICT:
        decoder = FastPredict(decoder, predict_input_fn,
                              kwargs['checkpoint_file'], kwargs)

        for sample in kwargs['samples']:
            # decode each sample
            logger.info("processing {}".format(sample))
            decode_sample(decoder, sample, **kwargs)

Paragjain10 commented 3 years ago

I tried running the code with the changes that you gave, but I ran into this error. Could you tell me what is supposed to be changed here. So, I can change things furthermore if required.

Traceback (most recent call last):
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/home/student2/anaconda3/envs/Parag_GreenAI/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 617, in decode
    raise(e)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 613, in decode
    raise (err)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 609, in decode
    **config['data']
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/wormbodies/02_setups/setup08/decode.py", line 196, in decode
    decode_sample(decoder, sample, **kwargs)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/wormbodies/02_setups/setup08/decode.py", line 131, in decode_sample
    (np.prod(prediction['affinities'].shape), 1, 1 )
IndexError: too many indices for array: array is 3-dimensional, but 4 were indexed
too many indices for array: array is 3-dimensional, but 4 were indexed
too many indices for array: array is 3-dimensional, but 4 were indexed

ok, yes, because the worm images are quite small and I had enough RAM I did it in one step. For larger (or 3d) data we had a similar issue

import time
import logging
try:
    import absl.logging
    logging.root.removeHandler(absl.logging._absl_handler)
    absl.logging._warn_preinit_stderr = False
except Exception as e:
    print(e)
import numpy as np
import tensorflow as tf
import h5py
import zarr
import os
import toml

from PatchPerPix.models import autoencoder, FastPredict
from PatchPerPix.visualize import visualize_patches

logger = logging.getLogger(__name__)

def predict_input_fn(generator, input_shape):
    def _inner_input_fn():
        dataset = tf.data.Dataset.from_generator(
            generator,
            output_types=tf.float32,
            output_shapes=(tf.TensorShape(input_shape))).batch(1)
        return dataset

    return _inner_input_fn

def decode_sample(decoder, sample, **kwargs):
    batch_size = kwargs['decode_batch_size']
    code_units = kwargs['code_units']
    patchshape = kwargs['patchshape']
    if type(patchshape) != np.ndarray:
        patchshape = np.array(patchshape)
    patchshape = patchshape[patchshape > 1]

    # load data depending on prediction.output_format and prediction.aff_key
    if "zarr" in kwargs['output_format']:
        pred_code = np.array(zarr.open(sample, 'r')[kwargs['code_key']])
        pred_fg = np.array(zarr.open(sample, 'r')[kwargs['fg_key']])
    elif "hdf" in kwargs['output_format']:
        with h5py.File(sample, 'r') as f:
            pred_code = np.array(f[kwargs['code_key']])
            pred_fg = np.array(f[kwargs['fg_key']])
    else:
        raise NotImplementedError("invalid input format")

    # check if fg is numinst with one channel per number instances [0,1,..]
    # heads up: assuming probabilities for numinst [0, 1, 2] in this order!
    if pred_fg.shape[0] > 1:
        pred_fg = np.any(np.array([
            pred_fg[i] >= kwargs['fg_thresh']
            for i in range(1, pred_fg.shape[0])
        ]), axis=0).astype(np.uint8)
    else:
        pred_fg = (pred_fg >= kwargs['fg_thresh']).astype(np.uint8)

    pred_fg = np.squeeze(pred_fg)
    fg_coords = np.transpose(np.nonzero(pred_fg))
    num_batches = int(np.ceil(fg_coords.shape[0] / float(batch_size)))
    logger.info("processing %i batches", num_batches)

 #   output = np.zeros((np.prod(patchshape),) + pred_fg.shape)
    sample_name = os.path.basename(sample).split('.')[0]
    outfn = os.path.join(kwargs['output_folder'],
                         sample_name + '.' + kwargs['output_format'])
    mode = 'a' if os.path.exists(outfn) else 'w'
    if kwargs['output_format'] == 'zarr':
        outf = zarr.open(outfn, mode=mode)
    elif kwargs['output_format'] == 'hdf':
        outf = h5py.File(outfn, mode)
    else:
        raise NotImplementedError
    chunkSzX = 10
    chunkSz = (int(np.prod(patchshape)),) + (pred_fg.shape[0], chunkSzX)
    data = outf.create_dataset(
        kwargs['aff_key'],
        shape=(np.prod(patchshape),) + pred_fg.shape,
        dtype=np.float32,
        chunks=chunkSz,
        compression='gzip')
    print(data.chunks)
    # exit()

    fg_coords_sorted = {}
    for c in fg_coords:
        fg_coords_sorted.setdefault(c[-1]//chunkSzX, []).append(c)
    print(fg_coords_sorted.keys())
    for x_slice, fg_coords in fg_coords_sorted.items():
        if (x_slice+1)*chunkSzX > pred_fg.shape[-1]:
            sz = pred_fg.shape[-1] - x_slice*chunkSzX
        else:
            sz = chunkSzX
        data_tmp = np.zeros((int(np.prod(patchshape)),) + (pred_fg.shape[0], sz),
                            dtype=np.float32)
        for b in range(0, len(fg_coords), batch_size):
            # print("new it")
            # start = time.time()
            fg_coords_batched = fg_coords[b:b + batch_size]
            fg_coords_batched = [(slice(None),) + tuple(
                [slice(i, i + 1) for i in fg_coord])
                                 for fg_coord in fg_coords_batched]
            pred_code_batched = [pred_code[fg_coord].reshape((1, code_units))
                                 for fg_coord in fg_coords_batched]
            if len(pred_code_batched) < batch_size:
                pred_code_batched = pred_code_batched + ([np.zeros(
                    (1, code_units))] * (batch_size - len(pred_code_batched)))
            # print(time.time() - start)
            # start = time.time()
            logger.info('in decode sample: {} ({}/{}, slice: {})'.format(
                pred_code_batched[0].shape,
                b, len(fg_coords), x_slice))
            predictions = decoder.predict(pred_code_batched)
            # print(time.time() - start)
            # start = time.time()
            # print("predict done")
            for idx, fg_coord in enumerate(fg_coords_batched):
                prediction = predictions[idx]
                # print(time.time() - start)
                # start = time.time()
                # print("id", idx, fg_coord, prediction['affinities'].shape)
                x = fg_coords[b+idx][-1] % chunkSzX
                # x = fg_coord[3].start % % chunkSzX
                data_tmp[fg_coord[0], fg_coord[1], fg_coord[2], x] = \
                    np.reshape(
                        prediction['affinities'],
                        (np.prod(prediction['affinities'].shape), 1, 1)
                    )
                # data[fg_coord] = np.reshape(
                #     prediction['affinities'],
                #     (np.prod(prediction['affinities'].shape), 1, 1, 1)
                # )
                # print(time.time() - start)
                # start = time.time()
        st = x_slice * chunkSzX
        nd = min((x_slice+1)*chunkSzX, pred_fg.shape[-1])
        data[:,:,st:nd] = data_tmp

    if kwargs['output_format'] == 'hdf':
        outf.close()

#    return output               # 

def decoder_model_fn(features, labels, mode, params):
    if mode != tf.estimator.ModeKeys.PREDICT:
        raise RuntimeError("invalid tf estimator mode %s", mode)

    logger.info("feature tensor: %s", features)
    logger.info("label tensor: %s", labels)

    ae_config = params['included_ae_config']

    is_training = False
    code = tf.reshape(features, (-1,) + params['input_shape'])
    dummy_in = tf.placeholder(
        tf.float32, [None, ] + ae_config['patchshape'])
    input_shape = tuple(p for p in ae_config['patchshape']
                        if p > 1)
    logits, _, _ = autoencoder(
        code,
        is_training=is_training,
        input_shape_squeezed=input_shape,
        only_decode=True,
        dummy_in=dummy_in,
        **ae_config
    )
    pred_affs = tf.sigmoid(logits, name="affinities")
    predictions = {
        "affinities": pred_affs,
    }
    return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)

def decode(**kwargs):
    sess_config = tf.ConfigProto()
    sess_config.gpu_options.allow_growth = True
    config = tf.estimator.RunConfig(
        model_dir=kwargs['output_folder'],
        session_config=sess_config)

    decoder = tf.estimator.Estimator(model_fn=decoder_model_fn,
                                     params=kwargs, config=config)

    if kwargs['mode'] == tf.estimator.ModeKeys.PREDICT:
        decoder = FastPredict(decoder, predict_input_fn,
                              kwargs['checkpoint_file'], kwargs)

        for sample in kwargs['samples']:
            # decode each sample
            logger.info("processing {}".format(sample))
            decode_sample(decoder, sample, **kwargs)

abred commented 3 years ago

Hi, I'm sorry, I am happy to fix bugs or help you if you tried it but cannot figure something out, but I can't do everything. The error, in combination with my description above, is pretty self-explanatory. The code used to be for 3d data, your data is 2d, so sometimes an extra dimension might be accessed that doesn't exist

IndexError: too many indices for array: array is 3-dimensional, but 4 were indexed

Paragjain10 commented 3 years ago

Hello @abred, I am extremely sorry. I did not intend to come across like this. I'll make sure that I try maximum things from my side first before approaching you. And I am really grateful to have your help.

Now, I tried changing the code in decode.py a little :

from this:

data_tmp[fg_coord[0], fg_coord[1], fg_coord[2], x] = \ np.reshape( prediction['affinities'], (np.prod(prediction['affinities'].shape), 1, 1) )

to this:

data_tmp[fg_coord] = \
        np.reshape(
            prediction['affinities'],
            (np.prod(prediction['affinities'].shape), 1, 1)
        )

I tried a few things first, but this was the only thing that got the code working. Is this change correct? After this, the code successfully computed the decode step but got stuck showing a similar error with exit code - 9. I think it occurs while computing the vote instances:

INFO:__main__:vote_instances checkpoint 20000 {'patch_threshold': 0.5, 'fc_threshold': 0.5}
INFO:__main__:reading data from ~/home/student2/Desktop/Parag_masterthesis/~/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/val/processed/20000
['01_23', '02_56', '10_1134', '05_74', '10_1124', '07_45', '01_11', '03_461', '08_469', '05_60', '02_3', '05_39', '02_17', '03_492', '10_1138', '10_1090', '04_1013', '10_1100', '03_437', '02_6', '06_51', '04_946', '09_747', '05_85', '07_92', '01_84', '10_1060', '09_753', '08_412', '08_421', '03_458', '07_58', '06_24', '04_979', '03_507', '02_14', '10_1107', '03_452', '03_531', '06_40', '01_25', '10_1135', '02_86', '01_73', '09_748', '03_475', '05_62', '08_491', '04_1019', '03_455', '06_3', '02_94', '09_726', '02_36', '03_477', '02_22', '06_76', '05_33', '03_528', '03_466', '02_90', '06_17', '03_502', '01_42', '10_1069', '03_471', '08_497', '09_768', '05_11', '08_407', '07_81', '01_74', '08_484', '01_29', '06_19', '03_467', '04_967', '07_51', '04_1031', '09_777', '08_423', '05_79', '06_68', '10_1067', '01_62', '07_42', '02_85', '07_29', '02_100', '07_85', '04_1018', '02_82', '06_4', '04_955', '02_24', '03_499', '07_13', '02_97', '01_14', '09_728', '04_1001', '03_509', '06_21', '07_63', '05_50', '04_1007', '04_1012', '04_1004', '01_82', '06_46', '10_1147', '02_50', '07_64', '04_940', '07_23', '08_404', '08_418', '04_958', '02_98', '04_1037', '02_48', '04_1033', '03_470', '04_999', '01_43', '09_735', '01_46', '05_87', '06_36', '10_1140', '05_56', '07_77', '03_515', '01_49', '01_59', '06_33', '03_446', '07_36', '06_29', '03_485', '04_1030', '06_64', '01_86', '08_415', '06_90', '01_68', '01_39', '09_756', '04_948', '01_28', '02_75', '09_779', '10_1114', '03_496', '03_505', '03_474', '09_775', '02_20', '07_33', '06_58', '10_1142', '01_63', '01_81', '05_25', '10_1076', '02_29', '04_938', '10_1080', '08_451', '05_7', '04_1005', '04_951', '04_1026', '03_519', '09_793', '06_12', '02_47', '10_1102', '09_785', '08_461', '01_6', '01_88', '08_496', '04_1028', '10_1104', '10_1133', '08_459', '07_41', '04_1032', '07_12', '10_1071', '07_54', '01_15', '02_15', '09_732', '02_2', '04_1006', '07_68', '07_18', '10_1129']
['01_23', '02_56', '10_1134', '05_74', '10_1124', '07_45', '01_11', '03_461', '08_469', '05_60', '02_3', '05_39', '02_17', '03_492', '10_1138', '10_1090', '04_1013', '10_1100', '03_437', '02_6', '06_51', '04_946', '09_747', '05_85', '07_92', '01_84', '10_1060', '09_753', '08_412', '08_421', '03_458', '07_58', '06_24', '04_979', '03_507', '02_14', '10_1107', '03_452', '03_531', '06_40', '01_25', '10_1135', '02_86', '01_73', '09_748', '03_475', '05_62', '08_491', '04_1019', '03_455', '06_3', '02_94', '09_726', '02_36', '03_477', '02_22', '06_76', '05_33', '03_528', '03_466', '02_90', '06_17', '03_502', '01_42', '10_1069', '03_471', '08_497', '09_768', '05_11', '08_407', '07_81', '01_74', '08_484', '01_29', '06_19', '03_467', '04_967', '07_51', '04_1031', '09_777', '08_423', '05_79', '06_68', '10_1067', '01_62', '07_42', '02_85', '07_29', '02_100', '07_85', '04_1018', '02_82', '06_4', '04_955', '02_24', '03_499', '07_13', '02_97', '01_14', '09_728', '04_1001', '03_509', '06_21', '07_63', '05_50', '04_1007', '04_1012', '04_1004', '01_82', '06_46', '10_1147', '02_50', '07_64', '04_940', '07_23', '08_404', '08_418', '04_958', '02_98', '04_1037', '02_48', '04_1033', '03_470', '04_999', '01_43', '09_735', '01_46', '05_87', '06_36', '10_1140', '05_56', '07_77', '03_515', '01_49', '01_59', '06_33', '03_446', '07_36', '06_29', '03_485', '04_1030', '06_64', '01_86', '08_415', '06_90', '01_68', '01_39', '09_756', '04_948', '01_28', '02_75', '09_779', '10_1114', '03_496', '03_505', '03_474', '09_775', '02_20', '07_33', '06_58', '10_1142', '01_63', '01_81', '05_25', '10_1076', '02_29', '04_938', '10_1080', '08_451', '05_7', '04_1005', '04_951', '04_1026', '03_519', '09_793', '06_12', '02_47', '10_1102', '09_785', '08_461', '01_6', '01_88', '08_496', '04_1028', '10_1104', '10_1133', '08_459', '07_41', '04_1032', '07_12', '10_1071', '07_54', '01_15', '02_15', '09_732', '02_2', '04_1006', '07_68', '07_18', '10_1129']
['01_23', '02_56', '10_1134', '05_74', '10_1124', '07_45', '01_11', '03_461', '08_469', '05_60', '02_3', '05_39', '02_17', '03_492', '10_1138', '10_1090', '04_1013', '10_1100', '03_437', '02_6', '06_51', '04_946', '09_747', '05_85', '07_92', '01_84', '10_1060', '09_753', '08_412', '08_421', '03_458', '07_58', '06_24', '04_979', '03_507', '02_14', '10_1107', '03_452', '03_531', '06_40', '01_25', '10_1135', '02_86', '01_73', '09_748', '03_475', '05_62', '08_491', '04_1019', '03_455', '06_3', '02_94', '09_726', '02_36', '03_477', '02_22', '06_76', '05_33', '03_528', '03_466', '02_90', '06_17', '03_502', '01_42', '10_1069', '03_471', '08_497', '09_768', '05_11', '08_407', '07_81', '01_74', '08_484', '01_29', '06_19', '03_467', '04_967', '07_51', '04_1031', '09_777', '08_423', '05_79', '06_68', '10_1067', '01_62', '07_42', '02_85', '07_29', '02_100', '07_85', '04_1018', '02_82', '06_4', '04_955', '02_24', '03_499', '07_13', '02_97', '01_14', '09_728', '04_1001', '03_509', '06_21', '07_63', '05_50', '04_1007', '04_1012', '04_1004', '01_82', '06_46', '10_1147', '02_50', '07_64', '04_940', '07_23', '08_404', '08_418', '04_958', '02_98', '04_1037', '02_48', '04_1033', '03_470', '04_999', '01_43', '09_735', '01_46', '05_87', '06_36', '10_1140', '05_56', '07_77', '03_515', '01_49', '01_59', '06_33', '03_446', '07_36', '06_29', '03_485', '04_1030', '06_64', '01_86', '08_415', '06_90', '01_68', '01_39', '09_756', '04_948', '01_28', '02_75', '09_779', '10_1114', '03_496', '03_505', '03_474', '09_775', '02_20', '07_33', '06_58', '10_1142', '01_63', '01_81', '05_25', '10_1076', '02_29', '04_938', '10_1080', '08_451', '05_7', '04_1005', '04_951', '04_1026', '03_519', '09_793', '06_12', '02_47', '10_1102', '09_785', '08_461', '01_6', '01_88', '08_496', '04_1028', '10_1104', '10_1133', '08_459', '07_41', '04_1032', '07_12', '10_1071', '07_54', '01_15', '02_15', '09_732', '02_2', '04_1006', '07_68', '07_18', '10_1129']
INFO:__main__:forking <function vote_instances_sample_seq at 0x7fc8a07aaa70>
INFO:PatchPerPix.vote_instances.vote_instances:processing ~/home/student2/Desktop/Parag_masterthesis/~/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/val/processed/20000/01_23.zarr
INFO:PatchPerPix.vote_instances.utilVoteInstances:keys: ['volumes']
exitcode: -9
Traceback (most recent call last):
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 1623, in <module>
    main()
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 1495, in main
    output_folder)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 767, in validate_checkpoints
    output_folder)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 110, in wrapper
    ret = func(*args, **kwargs)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 699, in validate_checkpoint
    vote_instances(args, config, data, pred_folder, inst_folder)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 110, in wrapper
    ret = func(*args, **kwargs)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 823, in vote_instances
    output_folder, sample)
  File "/home/student2/Desktop/Parag_masterthesis/PatchPerPix/PatchPerPix_experiments/run_ppp.py", line 127, in wrapper
    raise RuntimeError("child process died")
RuntimeError: child process died

Paragjain10 commented 3 years ago

Hello @abred,

One of the possible things that I could think of was changing parameters under [vote_instances] in the config to overcome this error. Tried changing the num_workers = 8 (to 4,2,1), also tried changing the value of [chunkszie], but ended up with the same error.