NVIDIA / DIGITS

Deep Learning GPU Training System
https://developer.nvidia.com/digits
BSD 3-Clause "New" or "Revised" License
4.12k stars 1.38k forks source link

logging the data creation parameters in the job folder #1838

Open RSly opened 7 years ago

RSly commented 7 years ago

Hi,

I have created a dataset using digits segmentation dataset and used 10% option for the validation data creation. I have trained multiple networks. now i want to change the training portion but keep the same validation set for comparison.

now is there any way that I can find the list of images which were used to generate that validation set? is it logged somewhere?

p.s. also I guess every-time I use this plugin, it chooses randomly 10% of the provided data? I know I could have used the"separate" option to keep track of exact training/val sets, but at that time I didnt use that option...

/cc @gheinrich maybe you can help? thanks

gheinrich commented 7 years ago

Hi, it's possible to get the indices of images used when splitting the dataset but not without some programming I'm afraid. The indices are stored in a variable of the dataset there: https://github.com/NVIDIA/DIGITS/blob/master/digits/extensions/data/imageSegmentation/data.py#L37 You can reload the dataset and inspect the contents of the seed field in the user data. That will allow you to perform the exact same split again.

RSly commented 7 years ago

thanks @gheinrich ! => "reload the dataset", is there a method I can call with the path to the generated dataset to perform this? any hint is great :)

I can see this field in the status.pickle, anything I can use? sS'seed' p226 I570

as a suggestion, it could be great if these parameters were logged in create_train_db_db.log and create_val_db_db.log :)

gheinrich commented 7 years ago

as a suggestion, it could be great if these parameters were logged in create_train_db_db.log and create_val_db_db.log :)

Good suggestion!

You can try something like:

job_dir = os.path.join(jobs_dir, job_id)
job = Job.load(job_dir)
RSly commented 7 years ago

thanks!

RSly commented 7 years ago

It could also be great to actually let the user provide the seed, as we do in the model definition of digits

this helps to have more deterministic results when needed