preprocessing data - Githubissues

dpressel / rude-carnie

Age detection in Tensorflow

937 stars 341 forks source link

preprocessing data #23

Closed sesoin closed 7 years ago

sesoin commented 7 years ago

I am using files for imdb data set and I get loss=nan . I feel there is problem in pre processing ? any idea what can be the problem?

dpressel commented 7 years ago

Are you referring to this set?

https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/

If so, I havent tried using it yet. Ill take a look later this week and see if I can get it running!

dpressel commented 7 years ago

Also, why do you think there is a problem in preprocessing... you can look at the images that are being run in tensorboard. I suggest that as a first step, during training

tensorboard --logdir /path/to/training

dpressel commented 7 years ago

I would probably need clarification on how you are preprocessing this, but after looking at this dataset, there currently would be no way to run the existing preprocessing code without first doing additional preprocessing on the data files. If you have done this already, please post that code in a gist, or point me to your fork and I can take a look and see if there is anything obvious.

When I get a few minutes, I will see if I can add some code to handle this corpus, in the meantime it is not supported.

BerenLuthien commented 7 years ago

In that dataset there are corrupted images that cannot be open. Simple solution is "try , except" and throw out those corrupted images. Further, some labels seem to be wrong, though I do not know how many labels are wrong.