Closed SuperN1ck closed 1 year ago
Hi Nick,
Yes as you suspected, I changed the format of the dataset into .tar shards to speed up training using an internal library. I also mentioned this in the issue. Unfortunately what this means is that if you have the unzipped dataset in its original format, you will have to create a dataset class that reads single files directly from your directory and then change the dataloader accordingly. The library that I used to convert the dataset is an internal one which is not yet been published.
Kind regards, Ahmed
Hey Ahmed,
thanks for the speedy clarification! Eventually, I want to run THOR on my own data anyway and not h2o
, so I'll see if I actually make the effort to write a new dataloader.
In any case, if you happen to release the sharding-code I would be interested (for this project but also generally speaking).
Cheers,
-Nick
Hey @ATAboukhadra cool work! I wanted to run your inference script
scripts/test_h2o.sh
but I am unable to do so. I fixed some stuff here and there on my end (like that differently structured.obj
s) but I am stuck now when loading the actual images during iterating thetestloader
.I investigated a little bit and it seems like you save
h2o
differently (in shards?) than default. When I downloadh2o
, I get fourtar.gz
-files, one for each subject (subjectX_v1_1.tar.gz
) but if I am not mistaken something, during your loading you are expecting 24tar
-files in the format of e.g.subject3_k2_0_cam4_rgb.tar
?Unfortunately, I can't really make out how to correctly package those
tar
-files, maybe you can share a script/explain how to geth2o
in the format you need.Cheers and many thanks, -Nick