deepjavalibrary / djl

An Engine-Agnostic Deep Learning Framework in Java
https://djl.ai
Apache License 2.0
4.11k stars 653 forks source link

Data normalization/de-normalization #41

Open jSaso opened 4 years ago

jSaso commented 4 years ago

Description

Create normalize class which has two method:

Normalize object must also have parameter - min, max (what is the minimum and maximum number of our number range) and interval (interval: 0 to 1, interval: -1 to 1)

Example: we have range of real numbers that needs to be normalized: [1, 5, 7, 12, 16, 19, 23, 3, 6, 33] Normalize class will have:

With all this information, we can normalize number and it will be prepared for train/test model. Each number, which enter network input as normalized number will have normalized class defined. On training we can easily de-normalize every number and compare it with our test data set (which also needs to be de-normalized)

Will this change the current api? How?

Yes, Normalization should be part of data set. Each number in INDArray should have also normalization object. So each number that comes in network input is normalized - 0 to 1 or -1 to 1. Also when network is training and we use listener, we can easily de-normalize number in the data set - predicted numbers can easily be de-normalized and then compared with de-normalized test data set.

Who will benefit from this feature?

Everybody, normalized data set will be simplified with provided normalization/de-normalization of numbers which enters network input model and also when network training is in progress

References

Example: https://github.com/eclipse/deeplearning4j/blob/b5f0ec072f3fd0da566e32f82c0e43ca36553f39/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/dataset/api/preprocessor/MultiDataNormalization.java I think this one is not good enough, it is simple normalization

keerthanvasist commented 4 years ago

Thank you for creating this issue @jSaso. I think it is a place for anyone looking to make their first contribution.

zachgk commented 4 years ago

@jSaso Can you help explain some of your thoughts behind this? Is your goal more to be able to view the input of the model and the output of the model denormalized, or more to see intermediate values denormalized?