Open GlKz13 opened 1 month ago
`Here is my code by the way:
with h5.File("preproccessed/events/Vid4_h5/LRx4/test/calendar.h5", "r") as h: print("All frames ", len(list(h["images"]))) print(h.keys())
print(list(h['images']))
# take 2 images
image1 = h['images']['000000']
image2 = h['images']['000001']
image1 = np.array(image1)
image2 = np.array(image2)
# take voxels between them
vf = np.array(h['voxels_f']['000000'])
vb = np.array(h['voxels_b']['000000'])
# stack them to get n = 2
device = "cuda"
image1 = torch.tensor(image1).to(torch.float32).cuda().permute(2, 0, 1)
image2 = torch.tensor(image2).to(torch.float32).cuda().permute(2, 0, 1)
images = torch.stack([image1, image2]).unsqueeze(0)
vf = torch.tensor(vf).to(torch.float32).unsqueeze(0).unsqueeze(0)
vb = torch.tensor(vb).to(torch.float32).unsqueeze(0).unsqueeze(0)
device = "cuda"
model = EvTexture()
model_path = 'experiments/pretrained_models/EvTexture_Vimeo90K_BIx4.pth'
weights = torch.load(model_path, map_location=device)
model.load_state_dict(weights["params"])
model = model.to(device)
images = images.to(device)
vf = vf.to(device)
vb = vb.to(device)
model.eval()
with torch.inference_mode():
res = model(images, vf, vb)
# res shape: (1, 2, 3, 576, 704)
`
Thank you for your interesting question about using only two frames as input and obtaining high-resolution output frames. Based on the shapes you've mentioned, they seem correct:
However, I have a question: Have you successfully tested the script ./scripts/dist_test.sh [num_gpus] options/test/EvTexture/test_EvTexture_Vid4_BIx4.yml
and obtained the results posted in the release?
I can suggest a simple way for you to quickly test it. You just need to modify the meta_info_file
in the config file (link), specifically basicsr/data/meta_info/meta_info_Vid4_h5_test.txt
, to replace its content with calendar.h5 2
. After that, run the test script options/test/EvTexture/test_EvTexture_Vid4_BIx4.yml
, which will only test the first two images of the calendar and output the results.
I tested this and received the following results:
for 000000.png
, the PSNR is 23.64, and for 000001.png
, it is approximately 23.60. The PSNR results in our release for the calendar images 000000
/000001
are 25.26/25.40 respectively.
I believe that inferring with only two frames leads to lower PSNR compared to using the entire video, as our model employs a recurrent structure, and two frames provide limited information, resulting in slightly poorer outcomes.
Hope this helps!
Thank you, I'll try!
Hello! Thank you for your model! Can you clarify me one more thing? In your forward method the following is written:
"""Forward function of EvTexture
Can you explain me how I should organize my data in, for example, calendar.h5 to feed the model? I mean in calendar.h5 there are "images" ([H, W]) and "voxels" ([Bins, H, W]) I tried to take 2 images, stacked them ( torch.stack([ image1, image2] ) then I took voxels between these 2 images ( that is one f_voxel and one backward voxel ) then unsqueeze everything to get this one batch( "b" in the forward function ) Finally we get this shapes: images: [1, 2, 3, H, W] voxels: [1, 1, 5, H, W] I tried to use the model: forward( images, voxels_f, voxels_b)
I really got an upscaled image but with awful quality so, what did I do wrong, I used test data published in this repo. I understand that I probably did smth wrong with shapes or wrongly organized the data BUT how exactly should I use h5 files with forward method? I want to know how to use "forward" method manually Thank you!