Closed gforge closed 8 years ago
The update is closing in a mutch more mature state. I'm struggling with how to approach the parallel dataset iterator. In the mnist example it is rather straight forward:
-- function that sets of dataset iterator:
local function getIterator(mode)
return tnt.ParallelDatasetIterator{
nthread = 1,
init = function() require 'torchnet' end,
closure = function()
-- load MNIST dataset:
local mnist = require 'mnist'
local dataset = mnist[mode .. 'dataset']()
dataset.data = dataset.data:reshape(dataset.data:size(1),
dataset.data:size(2) * dataset.data:size(3)):double()
-- return batches of data:
return tnt.BatchDataset{
batchsize = 128,
dataset = tnt.ListDataset{ -- replace this by your own dataset
list = torch.range(1, dataset.data:size(1)):long(),
load = function(idx)
return {
input = dataset.data[idx],
target = torch.LongTensor{dataset.label[idx] + 1},
} -- sample contains input and target
end,
}
}
end,
}
end
The problem here is that they use the mnist
package for loading the actual data into each thread and then pass a global index to the batch dataset. The sampler functions that we use from the Twitter dataset keep an internal index. We could modify so that:
threads:addjob(
get
method with the to_tensor (alt. override the Batchframe
inherited get
with a specific get
method for the Batchframe
Facebook's Torchnet (just released) has it's own implemented dataset solution. It lacks a csv-interface, handling of categories, and core statistics. It is from what I understand mostly an approach to sampling that requires functions to have two functions implemented:
dataset:size()
which returns the size of the dataset.dataset:get(idx)
whereidx
is a number between 1 and the dataset size.This would require changing the
Dataframe:size()
function in the develop branch to return only number rows instead of both rows and columns. Theget()
is synonymous withget_row()
if I understand this correctly:The latter sentence suggests that changing the internal storage (issue #16) may be wise for optimal integration.