dimitrisPs / u_cv_python

implementation of CV algorithms
1 stars 0 forks source link

An question about DeepPruner_SCARED Repository #1

Open SuperAKK opened 2 years ago

SuperAKK commented 2 years ago

Hello dimitris!

Sorry to bother you, this is a question about the DeepPruner_SCARED Repository (https://github.com/dimitrisPs/DeepPruner_SCARED), and that repository cannot create an issue.

I noticed that you achieved great performence on that challenge and mentioned how to generate disparity samples in the DeepPruner_SCARED Repository. Can you share the data manipulation code of the data processing? This will be a great help for me.

Thank you! ❤ You have done a great work. Looking forward to your reply.

dimitrisPs commented 2 years ago

Hello and thanks for reaching out.

Indeed you cannot directly stereo rectify the provided depth maps.

I am in the process of cleaning my data manipulation code and I am going to release it in the following month. Until then you can follow a simple process described below to generate the disparity samples you want:

I am assuming that you are using OpenCV to stereo rectify your images (I1, I2) based on the datasets' provided calibration parameters. During this process, you used stereoRectify() which gave you two projection matrices (P1, P2) and two rectification transform matrices (R1, R2), plus a Q matrix which express the mapping between depth and disparity.

You can generate disparity maps as follows: for each sample:

If something was not clear please let me know and I will explain it better. Otherwise, I will let you know when I upload the data manipulation code.

SuperAKK commented 2 years ago

Many thanks for your timing reply!

I will try it according to your process and update the status in time.

That's so kind of you! Thanks again! 😀

SuperAKK commented 2 years ago

Hello dimitris!

Thank you for the detailed process of data pre-processing! I have successfully generated the disparity samples. ✌

And I have two more details to figure out:

  1. As you said, the disparity map will introduce additional error of 0.5 pixels at most due to the rounded projection coordinated (pl) in the left image. Then what value should pl.x use when calculating the disparity? Before round or after round?

I found that there is a small difference between the two ways, but the difference is also within 0.5 pixels. Does this difference matter?

  1. The other is about post-processing. I tried to convert the generated disparity map back to get the depth by following steps:
    • use reprojectImageTo3D() to get pointmap expressed in the left rectified frame of reference
    • apply the inverse R1 transformation to the points to express them in the left original frame of reference
    • use P1 to project the points to the left original frame (ol)
    • then create a depth image, and store the depth value in the pixel(round(ol.y), round(ol.x)).

However, the generated depth map does not seem to be correct. Is there something wrong with my backprojection process?

Looking forward to your reply! Thanks again!❤

SuperAKK commented 2 years ago

And I also noticed that there are many outliers in the generated disparity samples, which seem really unreasonable.

For example, the left image is black, the corresponding disparity map still has values. And these values are within the disparity range, which means the values can't be masked simplely!!

Did you have the same situation? If so, what kind of post-processing did you do?

Thank you!

dimitrisPs commented 2 years ago

Hi again,

Great to hear that you were able to produce the disparity maps

1) It does not really matter which pixel (floor(pl.x) or ceil(pl.x)) you assign the disparity value to, because either way you introduce a comparable amount of error. 2) the process you described, to go from the rectified disparity to the original depth, is mostly correct, except for step 3. Instead of using P1 to project your point-cloud back to the original left frame of reference, you need to call OpenCV's projectPoints() with inputs the original left camera matrix and distortion coefficients you get from the SCARED calibration files. The function will give you the projection location of each 3D point and will account for any distortions.

As for your second question message, I am not sure what do you mean by saying that the image is black. Do you mean that you get ground truth disparity values in areas where the image depicts black tissue or that you get disparity values in areas that are padded back during the stereo rectification process? The first case is completely normal, however, the second should not happen and there is probably an error in your disparity generation code.

I suggest first making sure that your disparity generation code works and then trying to implement the disparity to depth program.

I will try to upload a small gist showing how I generated the disparity maps during the weekend. Until then feel free to ask any clarification here.

dimitrisPs commented 2 years ago

Hi again,

I uploaded a sample script containing code to generate disparity images given a .tiff ground truth file. For simplicity, I didn't include calibration loading and rectification code but it should be fairly easy to copy those in.

I will let you know when I release the full conversion repository in case you are still interested. Let me know if you need any other help.

SuperAKK commented 2 years ago

Hello dimitris!

Really appreciate your patience! It's a great help for me!

Thanks to your detailed process, I have generated the disparity map with no errors. And I'll also run your code to make sure I'm doing it right.

There is one more question about post-processing. I didn't quite understand what you said in the post-processing part of repository: 'Because of the rectification alpha used for the test frames, this process results in pointmaps with a grid of unknown values. The last step is to interpolate the missing values using cubic interpolation.' And I want to figure out how to do the last step.

Thanks again! 😀

dimitrisPs commented 2 years ago

Hi,

No worries at all, I am happy to help.

Using the network's disparity prediction you can compute a point cloud in the left rectified frame of reference. To evaluate against the provided test sequence you need to have depth map information in the original left frame of reference. If you try to project the estimated point cloud back to the original frame of reference (after first rotating it etc), the reconstructed points may not be dense enough to project to every pixel of the left frame. Because the SCARED evaluation imposes a penalty to unknown depth values, after having projected the reconstructed point cloud to the original frame of reference, you need to interpolate missing pixel values based on adjacent depth information.

given a semi-dense depth map, you can use the following function to populate the missing depth values.

import numpy as np
from scipy import interpolate

def interpolate2d(array):
    x = np.arange(0, array.shape[1])
    y = np.arange(0, array.shape[0])
    #mask invalid values
    array = np.ma.masked_invalid(array)
    xx, yy = np.meshgrid(x, y)
    #get only the valid values
    x1 = xx[~array.mask]
    y1 = yy[~array.mask]
    newarr = array[~array.mask]

    out = interpolate.griddata((x1, y1), newarr.ravel(),(xx, yy),method='cubic')
    return out

If you run the above code you will find it is very slow. There are certainly other methods to achieve the same thing a lot faster, but this is what I did for the challenge.

SuperAKK commented 2 years ago

Hello dimitris!

Really appreciate your sample script. I found that the generated disparity maps are about the same, it's just that the positions of the pixels sometimes make a one-pixel difference. This may be caused by the rounded projection coordinate.

I also noticed that you set alpha=0 in stereorectify() on the train set, while set alpha=1 on the test set. And when I project test set disparity back to the depth according to the corresponding Q, the generated depth map has a lot of fine lines, as shown in the following figure.

back_depth_alpha1

Is this normal, or I am doing something wrong.

Thank you! ❤

dimitrisPs commented 2 years ago

Hi again,

for the training data, I used alpha=0 because I wanted the resulting stereo rectified frames to cover the whole image and not include any black borders in the periphery. The reasoning behind this was that I didn't want to train the network with images containing black patches. However, by using alpha=0 some ground truth points project outside the rectified image frames.

Again, because the SCARED evaluation protocol imposes a penalty for unknown pixels values, for the testing sequence, I stereo rectified the test set with alpha=1 to ensure that whole stereo frames are visible in the rectified views. Running inference in those frames would provide depth values for the whole original image area. converting those back depth expressed in the original frame of reference would result in depth maps with missing information in a grid pattern like the one you shared (missing values where in black parts of your pixels and also spanning the whole image, not just the middle). To populate the grid with depth values I used the interpolate2d function I shared in my previous message.

To answer your question.

I am not sure what the image you shared is showing. if the dark pixels indicate missing values then this is normal, however, I would expect the grid to span the whole image and not be limited only to the middle of the image. Furthermore, I am not sure about the scale you are using, assuming that color intensity indicates depth, your depth map is too bright, do you get reasonable depth values?

I can provide you with a conversion script to remove any guesswork but you would have to wait until the end of the week.

SuperAKK commented 2 years ago

Hello dimitris! Really appreciate your patience!

Forgive me for not being clear. Actually, this image is a mask, and I just want to show where there is no value in the disparity image.

And I also used your pre-trained model to predict the testset, but the generated disparity maps didn't seem to be correct. As shown, the rectified left image and the prediction are provided. Left_Image8k0_000000 disp_Left_Image8k0_000000

Am I doing something wrong? My conda environment uses python3, and DeepPrunner recommends python2. Could this be the reason?

Thank you! ❤

dimitrisPs commented 2 years ago

Hi again,

I just downloaded the repo and weights and tested with a copy of the test keyframes rectified with alpha=0. On my end, the disparity for this particular sample gets predicted correctly (see below).

To run inference I am using the following command

 python submission_scared.py --datapath {path to the folder containing the left_rect and right_rect directories} --loadmodel ./deeppruner_finetune_scared_epoch_290.tar --save_dir ./out_kfs --logging_filename test.log

Python 3 should be fine as I am also running on 3. The only problem that you may experience is in this and this files where comparisons are made using "is" instead of "==". I just found out about that and I will push updates.

Try and update the file mentioned above. If you are still having issues and you are using the same script, and weights then there is something wrong with input data samples. Check the following things:

during the weekend I will try to post a snippet on how I converted disparities back to the original depth. Let me know if you managed to make the network work.

8_0

dimitrisPs commented 2 years ago

Hello again,

I've uploaded the script to convert the disparities to depth, expressed in the original frame of reference here.

Did you manage to make reasonable disparity predictions using the network? Another thing I forgot to mention in my previous message is that you need to rectify each dataset with the corresponding calibration parameters, each dataset has its own calibration.

SuperAKK commented 2 years ago

Hello dimitris! Really appreciate for sharing the code.

I have successfully used your repo and weights to generate disparity map similar to yours. As shown, the keyframes (dataset8_keyframe_0) and disparity are provided.

Left_Image newrec_d8k0

However, you mentioned that the test keyframes are rectified with alpha=0, and the disparity map seem to be predicted on the keyframes rectified without setting alpha.

As shown below, the keyframes rectified with alpha=0 and alpha=1 are provided. Left_Image Left_Image8k0_000000

So one more thing I want to figure out is, you mentioned that the images used in training are all rectified with alpha=0, while the test images are rectified with alpha=1. Wouldn't the difference of the image contents affect the network's performance? Do I need to rectify both the train and test images with the same alpha value?

Thanks again! 😀

dimitrisPs commented 2 years ago

hello,

I am happy to see that you manage to get reasonable disparity results.

Indeed in my previous message, I used frames rectified without setting any rectification alpha, apologies If I confused you.

It should not matter if you change the alpha between your test and training set, because the stereo matching networks essentially learn to find pixel correspondences between the two images. By changing the alpha you only scale, slightly, the disparity range in your training set.

With that being said, because I only trained on only a few frames there may be some accuracy difference if you change the alpha used to stereo rectify the training set. This is what I believe, I didn't experiment with training using different alphas for rectification, therefore, I cannot give you a definitive answer.

SuperAKK commented 2 years ago

Hello dimitris! That's so kind of you!

As said before, I have successfully used your repo and weights to generate disparity map. And then I used the file (https://github.com/dimitrisPs/DeepPruner_SCARED/blob/master/disparity_to_original_depth.py) to convert the disparity back to the depth with no problem.

As shown below, the testset8keyframe_5's Left_image and the converted Depth image are provided. Left_Image d9k4_leftdepth

And then I measured the depth mean absolute error according to the report (mask no ground truth area and discard frames for which less than 10% of the frames have ground truth measurements). But the results seem to be inconsistent with the results of your methods reported in the paper. I take the keyframe_5 of testset8&9 for example, since the keyframe_5 is a single frame. In the paper, the MAE of d8k5 and d9k5 of your methods are 0.62mm and 0.41mm, while the results I measured are 1.91mm and 0.75mm respectively. I also measured the other keyframes, and the results were still worse than reported.

Am I doing something wrong? Sorry to bother you again, and wish you a happy weekend! 😀

dimitrisPs commented 2 years ago

Hi again,

Most likely, yes, something is wrong in your evaluation or data conversion process. I cloned this repository to evaluate those two frames and I got results that are very close to what was reported in the scared paper.

I really cannot know what the issue might be but I would check:

With the bug in the disparity_to_original_depth.py evaluation code fixed, I am getting a 0.42mm MAE error for ds9_kf4 and 0.65mm MAE error for ks8kf4.

Those values are slightly different from the ones reported in the paper and that is for two reasons:

I uploaded the inferred disparity and converted depth files from my code here. If you are able to reproduce them then the problem is in your evaluation code. Remember to change the calibration parameters in the disparity_to_original_depth.py script and save disparity/depth images after you multiply their values by 128 and convert them to unint16.

SuperAKK commented 2 years ago

Hello dimitris! Really appreciate your detailed explanation. ❤

I measured the depth mean absolute error according to the report, and the test results are shown below:

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

 | k1 | k2 | k3 | k4 | k5 -- | -- | -- | -- | -- | -- d8 | 7.80 | 2.11 | 1.96 | 2.58 | 0.64 d9 | 4.75 | 1.21 | 3.65 | 1.71 | 0.42

As shown, most of the results have been successfully reproduced close to the challenge report. However, there are a few results that are quite different, such as d9k2, d9k3 and d9k4. And the results in the report are 0.65, 1.62 and 0.77mm respectively.

Am I doing something wrong? I also provide the 10 keyframe results as below,

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

  | k1 | k2 | k3 | k4| k5 -- | -- | -- | -- | -- | -- d8 | 2.45 | 1.65 | 0.68 | 0.82 | 0.64 d9 | 2.54 | 0.65 | 0.89 | 1.01 | 0.42

Since the keyframe_5 is a single frame, the results of k5 in both tables are the same.

I saw you mentioned in the challenge report that d9k3 has misalignment, maybe this is the reason for the difference? have you done any extra process for this?

Thank you so much! 😀

dimitrisPs commented 2 years ago

Hi again,

Thank you for going through the effort of re-evaluating all sequences and I am also happy to see that you managed to reproduce most of the results.

Inaccuracies in ground truth sequence do not contribute to differences between your evaluation and what is reported in the paper.

You only get different results for keyframes where ground truth information drops below 10% of the image area for a long time. Therefore the issue has to do with how frames are discarded and how the error is aggregated within a keyframe sequence. The results presented during the EndoVis challenges were different from the ones reported in the paper because although low coverage frames were discarded, the mean error for a given keyframe was computed based on the total number of frames instead of the number of valid frames(gt information >10%).

This was discovered between the EndoVis challenge date and the joint Arxiv publication. Because of this, the results were adjusted for the Arxiv paper. As you pointed out, however, reported results for DeepPrunner are a lot different from what you got and also from what the other two late submissions managed to achieve.

This makes me believe that, only for DeepPrunner, the old evaluation code was used. You can test if that is the case by computing the mean error, dividing across with the total number of frames instead of the number of valid frames. Alternatively, you can multiply the values you got by the ratio: ( number_of_good_frames/number_of_total_frames). If I am correct, you should get values close to what was reported in the paper.

Can you please confirm that this is the issue? If it is, I can contact the first authors of the SCARED paper, and ask them to update the paper.

Thanks again and looking forward to your answer.

SuperAKK commented 2 years ago

Hello dimitris ! Thanks for your timing reply! ❤

Exactly as you said, many frames of d9k1, k2, k3 are discarded because their ground truth information drops below 10%. Number of good frames per dataset are shown as below:

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

good_frames/total_frames | k1 | k2 | k3 | k4| k5 -- | -- | -- | -- | -- | -- d8 | 945/945 | 637/637 | 693/693 | 877/877 | 1/1 d9 | 903/903 | 313/590 | 438/953 | 122/309 | 1/1

Therefore, my re-evaluated results of d9k1, k2, k3 can be converted in the old evaluation way as: <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

  | k2 | k3 | k4 -- | -- | -- | -- d9 | 0.64 | 1.68 | 0.68

And yes! You are right, the converted values of d9k2, k3, k4 are indeed close to the results in the paper. 😄 Maybe you can contact the author to update their article. I also convert your DeepPrunner results (only the results of d9k2k3k4 need to be updated) in the report by multiply the ratio (number_of_total_frames/number_of_good_frames),

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

  | k1 | k2 | k3 | k4 | k5 -- | -- | -- | -- | -- | -- d8 | 7.73 | 2.07 | 1.94 | 2.63 | 0.62 d9 | 4.85 | 1.23 | 3.52 | 1.95 | 0.41

The results look more normal and seem to be no problem!

You are so kind, and thanks again for your efforts!

dimitrisPs commented 2 years ago

Hello,

I want to thank you for pointing out this issue in the first place and also confirming what the issue is.😌

I will contact the first author of SCARED regarding this and point them to this github issue. For now, I am leaving the issue open so I can post an update once the author gets back to me.

Many thanks, Dimitris

SuperAKK commented 2 years ago

Hello Dimitris! OK, I'll keep watching for updates.

Thank you for sharing the code, for your detailed explanation, for everything you do! ❤

Actually without your help, I not only couldn't reproduce the results, I didn't even know how to convert depth to disparity map. 😂

You are so kind, and thanks again! Wish you all the best!

SuperAKK commented 2 years ago

Hello Dimitris!

Congratulations to your latest TMI paper! You have done a great work!

It seems that the results of the SCARED paper have not been updated, I wonder how it is going. Are there any mistakes in our previous discussion? 😅

Congratulations again! and Looking forward to your reply.

dimitrisPs commented 2 years ago

Hello,

Thank you very much.

Indeed the SCARED paper does not seem to have been updated yet. I did notify the authors of SCARED and pointed them to this GitHub issue. I do not know the progress of the revision, or what may delay it, but I think the conclusion drawn in our last conversation holds. If you want to bring this matter back to the attention of SCARED's authors and get more information about the revision progress, contacting them directly via email is probably best.