H2O Dataset Format - Githubissues

SuperN1ck commented 1 year ago

Hey @ATAboukhadra cool work! I wanted to run your inference script scripts/test_h2o.sh but I am unable to do so. I fixed some stuff here and there on my end (like that differently structured .objs) but I am stuck now when loading the actual images during iterating the testloader.

I investigated a little bit and it seems like you save h2o differently (in shards?) than default. When I download h2o, I get four tar.gz-files, one for each subject (subjectX_v1_1.tar.gz) but if I am not mistaken something, during your loading you are expecting 24 tar-files in the format of e.g. subject3_k2_0_cam4_rgb.tar?

Unfortunately, I can't really make out how to correctly package those tar-files, maybe you can share a script/explain how to get h2o in the format you need.

Cheers and many thanks, -Nick

ATAboukhadra commented 1 year ago

Hi Nick,

Yes as you suspected, I changed the format of the dataset into .tar shards to speed up training using an internal library. I also mentioned this in the issue. Unfortunately what this means is that if you have the unzipped dataset in its original format, you will have to create a dataset class that reads single files directly from your directory and then change the dataloader accordingly. The library that I used to convert the dataset is an internal one which is not yet been published.

Kind regards, Ahmed

SuperN1ck commented 1 year ago

Hey Ahmed, thanks for the speedy clarification! Eventually, I want to run THOR on my own data anyway and not h2o, so I'll see if I actually make the effort to write a new dataloader. In any case, if you happen to release the sharding-code I would be interested (for this project but also generally speaking). Cheers, -Nick

ATAboukhadra / THOR-Net

H2O Dataset Format #3