chensong1995 / E-CIR

Event-Enhanced Continuous Intensity Recovery (CVPR 2022)
MIT License
42 stars 2 forks source link

Hello, can you provide a few samples of the event dataset you generated based on the REDS dataset, I would like to debug create_hdf5.py #4

Open booker-max opened 2 years ago

booker-max commented 2 years ago

Your work is really great, it gave me a lot of inspiration, but I still have some parts that I did not understand, especially how to obtain the derivative_{gt}, I am now reading your code, if you can disclose a part of the dataset for debug the create_hdf5.py, I will be grateful.

chensong1995 commented 2 years ago

Hello booker-max,

Thanks for your interest in our work! I have uploaded some sample data here for debugging. Additionally, this file explains how to prepare the entire dataset for create_hdf5.py. To obtain ground-truth polynomial coefficients, we assemble the constraints (what the intensity values are at certain timestamps) into a linear system Ax=b. It is indeed a bit tricky. Let me know if you need additional clarifications. I hope this helps!

booker-max commented 2 years ago

Thank you.

chenkang455 commented 1 year ago

Thanks for sharing your sample data @chensong1995. However, I found that the number of images under "corrupted " is not equal to the number under "resized". And the code source is below:

with open(csv_name, 'w') as f:
        for i_frame in range(500):
            in_name = os.path.join(in_dir, '{:08d}.png'.format(i_frame))
            frame = cv2.imread(in_name, cv2.IMREAD_GRAYSCALE)
            frame = cv2.resize(frame, (args.output_width, args.output_height))
            out_name = os.path.join(out_dir, 'frames_{:010d}.png'.format(i_frame))
            cv2.imwrite(out_name, frame)
            timestamp = int(i_frame / 120 * 1e9) + 1
            f.write('{},frames_{:010d}.png\n'.format(timestamp, i_frame))

I cannot understand why the number of "resized" is 500 instead of 485, which is the number of "corrupted" images. And the code above cause the inconsistency below:

print(data['sharp_frame'].shape)
print(data['video_idx'].shape)
print(data['frame_idx'].shape)
print(data['blurry_frame'].shape)
print(data['event_map'].shape)
print(data['keypoints'].shape)
print(data['derivative'].shape)
print(data['primitive'].shape)
print(data['sharp_frame_ts'].shape)

(7515, 1, 180, 240)
(7275,)
(7275,)
(7275, 1, 180, 240)
(7275, 26, 180, 240)
(7275, 10, 180, 240)
(7275, 10, 180, 240)
(7275, 10, 180, 240)
(7275, 14)

Thank you very much if you could answer my question!!! And thanks for your great work!

chensong1995 commented 1 year ago

Hello chenkang455,

The burry images in corrupted are generated by event simulator ESIM. I believe the core idea in their implementation is that there is a sliding windows of 14 sharp frames moving with a stride of 1. This results in 485 different sliding window locations when the total number of sharp images is 500.

I hope this helps!

chenkang455 commented 1 year ago

Hello chenkang455,

The burry images in corrupted are generated by event simulator ESIM. I believe the core idea in their implementation is that there is a sliding windows of 14 sharp frames moving with a stride of 1. This results in 485 different sliding window locations when the total number of sharp images is 500.

I hope this helps!

Got it! Thanks a lot!

chenkang455 commented 1 year ago

Hello @chensong1995 , Sorry to bother you again. I found some confusing troubles in your test code.

  def test(self):
        print('Testing on REDS')
        for key in self.model.keys():
            self.model[key].eval()
        metrics = {}
        for metric_name in ['MSE', 'PSNR', 'SSIM']:
            metrics[metric_name] = AverageMeter()
        with torch.no_grad():
            print("loading data")
            iter = 0
            for i_batch, batch in enumerate(tqdm(self.test_loader)):
                iter += 32
                print("data loaded Successfully")
                pred = self.model['g'](batch['blurry_frame'],
                                       batch['event_map'],
                                       batch['keypoints'])
                pred['coeffs'], _ = self.integrator(pred['derivative'],
                                                    batch['blurry_frame'],
                                                    batch['keypoints'])
                pred['frame_init'] = self.constructor(pred['coeffs'],
                                                      batch['timestamps'])
                frame = pred['frame_init']
                if self.args.lambda_ref > 0:
                    pred['frame_refine'], _ = self.model['r'](pred['frame_init'],
                                                              batch['event_map'])
                    frame = pred['frame_refine']
                frame = np.clip(frame.detach().cpu().numpy(), 0, 1)
                # vedio_idx [bs] frame_index [bs] frame_gt[bs,kp,w,h] frame[bs,kp,w,h]
                video_idx = batch['video_idx'].detach().cpu().numpy()
                frame_idx = batch['frame_idx'].detach().cpu().numpy()
                frame_gt = batch['sharp_frame'].detach().cpu().numpy()
                for i_example in range(frame.shape[0]):
                    # 第 i 个 video,这里 i 的值对应 video_idx[i_example]
                    save_dir = os.path.join(self.args.save_dir,
                                            'reds_output',
                                            '{:03d}'.format(video_idx[i_example]))
                    os.makedirs(save_dir, exist_ok=True)
                    # i_time frame frame.shape[1]:keypoints
                    for i_time in range(frame.shape[1]):
                        # 第i_example帧模糊图片当中 第i_time关键点处图片
                        save_name = os.path.join(save_dir,
                                                '{:06d}_{}.png'.format(frame_idx[i_example],
                                                                       i_time))
                        cv2.imwrite(save_name, frame[i_example, i_time] * 255)
                        gt = np.uint8(frame_gt[i_example, i_time] * 255)
                        pred = np.uint8(frame[i_example, i_time] * 255)
                        for metric_name, metric in zip(['MSE', 'PSNR', 'SSIM'],
                                                       [skimage.metrics.normalized_root_mse,
                                                        skimage.metrics.peak_signal_noise_ratio,
                                                        skimage.metrics.structural_similarity]):
                            metrics[metric_name].update(metric(gt, pred))
                del pred,frame
                torch.cuda.empty_cache()
                print(torch.cuda.memory_allocated()*1e-6)
        info = 'MSE: {:.3f}\tPSNR: {:.3f}\tSSIM: {:.3f}'.format(metrics['MSE'].avg,
                                                                metrics['PSNR'].avg,
                                                                metrics['SSIM'].avg)
        print('Results:')
        print(info)

You compared variable gt with pre to calculate the performance index in your test code. However, I found that their shape doesn't match.(In short, although their timestamps are the same 14, but their meaning is different) gt refers to frame_gt, which is acquired from batch['sharp_frame'], pre refers to the model's output at keypoints during one exposure time.

Here, the shape of gt is [bs, timestamps, width, height], and the shape of pre is [bs, timestamps, width, height]. Although gt and pre's timestamps are both equal to 14. But I don't think they mean the same thing. gt(sharp_frame) is calculated from the code below:

sharp_frame_idx = data['sharp_frame_idx'][idx]
sharp_frame = np.squeeze(data['sharp_frame'][sharp_frame_idx], axis=1) 

data['sharp_frame']. shape = (7515, 1,180,240). For each video, its' shape is (7515/15 = 501, 1,180,240). Hence, data['sharp_frame'] corresponds to the original REDS_dataset(which has 500 frames per video). And timestamps mean 14 images corresponding to the blurring images in your dataset. However, the model output is 14 images at timestamps during one exposure, which is calculated with one blur image.

Therefore, though their size is equal, their meaning is quite different.
The diagram is shown below. image

Thank you very much if you could solve my questoin!!

chensong1995 commented 1 year ago

Hello chenkang455,

Thanks for your question! In your diagram, the sliding window moves at a stride of 14 sharp frames. In our experiments, however, the stride is one sharp frame. Blurry frame 1 corresponds to sharp frames 1-14, and blurry frame 2 corresponds to sharp frames 2-15, and so on. I hope this helps!

chenkang455 commented 1 year ago

Hello @chensong1995, Thanks for your replying. But what I want to express is that the .hdf5 file you provided, a video contains 'sharp_frame' for 500 frames and 'blur_frame' for 485 frames. But when the model inputs a blurred picture, the output of the model is a 'blur_frame' of the frames at the 'timestamp'. If every blurred image in the video is put into the model, the total number of frames output will be 485*keypoints, which is much larger than 500.

To match the data size of both, you use a sliding window to solve this problem. But this is a reuse of the sharp_frame. In other words, I think your dataset does not have a section of blur_frame where the keypoint corresponds to the sharp_frame.

Thank you very much if you could solve my questoin!!

chenkang455 commented 1 year ago

image The diagram is updated below, hope you could understand, thanks a lot!

chensong1995 commented 1 year ago

Hello chenkang455,

One sliding window contains one blurry frame, which corresponds to 14 sharp frames. I hope this helps!

chenkang455 commented 1 year ago

Got it !! Thanks a lot!

ice-cream567 commented 11 months ago

Hello booker-max,

Thanks for your interest in our work! I have uploaded some sample data here for debugging. Additionally, this file explains how to prepare the entire dataset for create_hdf5.py. To obtain ground-truth polynomial coefficients, we assemble the constraints (what the intensity values are at certain timestamps) into a linear system Ax=b. It is indeed a bit tricky. Let me know if you need additional clarifications. I hope this helps!

Hello, I also want some data to debug create_hdf5.py. But the above link has expired, could you give me a new link? I will be very grateful.

chensong1995 commented 11 months ago

Hi ice-cream567,

I will upload the sample data to this Baidu link. The password is h165. Please follow the instructions here to assemble the zip file from the splits. The upload should finish within around 24 hours.

I hope this helps! Let me know if you have further concerns.

ice-cream567 commented 11 months ago

Sorry, I think you didn't understand my question. Maybe my previous question confused you. I now want some event data simulated from the REDS dataset to debug the create_hdf5.py file. What I downloaded in the link you sent me was an hdf5 file, not event data. Maybe my English is so poor that you didn't understand what I meant. Please help me solve this problem again when you have time. Thanks again for your reply. Good luck with your work.

The figure below is the data I got.

两点水 @.***

 

------------------ 原始邮件 ------------------ 发件人: "chensong1995/E-CIR" @.>; 发送时间: 2023年9月1日(星期五) 上午6:59 @.>; @.**@.>; 主题: Re: [chensong1995/E-CIR] Hello, can you provide a few samples of the event dataset you generated based on the REDS dataset, I would like to debug create_hdf5.py (Issue #4)

Hi ice-cream567,

I will upload the sample data to this Baidu link. The password is h165. Please follow the instructions here to assemble the zip file from the splits. The upload should finish within around 24 hours.

I hope this helps! Let me know if you have further concerns.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

chensong1995 commented 11 months ago

Hi ice-cream567,

Thanks for the clarification! Please find the sample data to run create_hdf5.py here (Baidu). The passcode is vg2f.

I hope this helps! Let me know if you have further concerns.

ice-cream567 commented 9 months ago

Hello chenkang455,

Thanks for your question! In your diagram, the sliding window moves at a stride of 14 sharp frames. In our experiments, however, the stride is one sharp frame. Blurry frame 1 corresponds to sharp frames 1-14, and blurry frame 2 corresponds to sharp frames 2-15, and so on. I hope this helps!

Hello, after reading your conversation, I am still confused about this, that is, why the number of clear images is 500, while the number of blurry images is 485. Question 1: How to obtain a blurred picture by averaging 14 clear images, and the step size is one clear frame, then the clear image index [0-13] corresponds to the blurred image [0], and the clear image index [1- 14] corresponds to the blurred image [1],...,[486-499] corresponds to the blurred image [486], so there should be a total of 487 blurred images. Why is it actually 485. Question 2: Will this approach cause the exposure time between blurred frames to overlap, because the exposure time of blurred frame [0] corresponds to the exposure time of clear frame [0-13], and the exposure time of blurred frame [1] corresponds to clear Exposure time for frames [1-14].

Sorry for wasting your time. I hope you can give me some suggestions in your spare time.

chensong1995 commented 9 months ago

Hi ice-cream567,

Thanks for your follow-ups!

Why is it actually 485.

This is a wonderful question, and I honestly don't know the answer. I use ESIM to synthesize the events and the 485 blurry images are what ESIM outputs. Perhaps, ESIM does not produce the first and last frames -- but this is just my guess.

Will this approach cause the exposure time between blurred frames to overlap?

Yes.

I hope this helps! Let me know if you have further concerns.