OpenMined / KotlinSyft

The official Syft worker for secure on-device machine learning
https://www.openmined.org
Apache License 2.0
85 stars 28 forks source link

Training is being done on few examples rather than whole dataset #286

Closed mustansarsaeed closed 3 years ago

mustansarsaeed commented 3 years ago

Question

Is DemoApp only considering few examples till batch_size and ignoring rest of dataset?

Further Information

Hi All, I am seeing following code in LocalMnistDataDataSource.kt

fun loadDataBatch(batchSize: Int): Pair<Batch, Batch> {
    val trainInput = arrayListOf<List<Float>>()
    val labels = arrayListOf<List<Float>>()
    for (idx in 0..batchSize)
        readSample(trainInput, labels)
    val trainingData = Batch(
        trainInput.flatten().toFloatArray(),
        longArrayOf(trainInput.size.toLong(), FEATURESIZE.toLong())
    )
    val trainingLabel = Batch(
        labels.flatten().toFloatArray(),
        longArrayOf(labels.size.toLong(), 10)
    )
    return Pair(trainingData, trainingLabel)
}

What I am understanding that demoapp is only considering only number of examples till batch_size i.e. if batch size is set to 10 that means training is happening on first 10 examples. Rest of dataset is not being used. Is that correct? Because I am not seeing any code that iterates over all examples.

mccorby commented 3 years ago

Related to #268

vkkhare commented 3 years ago

@mustansarsaeed This is not the case. The readline function called inside automatically pushes the cursor to next line once a read occurs. We keep the pointer to the same buffer so that we don't start from the beginning. https://github.com/OpenMined/KotlinSyft/blob/dev/demo-app/src/main/java/org/openmined/syft/demo/federated/datasource/LocalMNISTDataDataSource.kt#L61