Open jasperDD opened 5 years ago
Thanks for the issue. Adding labels to get better visibility.
@mxnet-label-bot add [R]
Your issue appears more related to converting a video into an array than to mxnet per se.
I'd recommend to take a look at the imager
package vignette describing handling of video format here: https://dahtah.github.io/imager/imager.html#loading-and-saving-videos. This should get you with 4 dimensionnal arrays: [Width, Height, Time, Colors]
It may be worth considering working with images extracted from the video with 2D convolutions rather than directly with 3D convolutions.
jeremiedb, thanks, good suggestion
jeremiedb, may i ask you, what place in this link about how convert video to dataframe? like label pixel.1 pixel.2 pixel.3 pixel.4 pixel.5 pixel.6 pixel.7 1 304 304 304 304 304 304 304 304 2 32 32 32 32 32 32 32 32 3 350 351 351 351 351 351 351 351 4 265 265 265 265 265 265 265 265 5 108 108 108 108 108 108 108 108 6 87 87 87 87 87 87 87 87 7 191 192 192 192 192 192 192 192 8 170 170 170 170 170 170 170 170 9 329 329 329 329 329 329 329 329 10 268 268 268 268 268 268 268 268 11 238 238 238 238 238 238 238 238 12 159 159 159 159 159 159 159 159 13 220 221 221 221 221 221 221 221
cause here only for image array (parrot)
@jasperDD
Data should be handled as arrays in mxnet. For images, each observation is 3D (HxWxC). C refers to the color channels. To store multiple observations, a 4th dimension is therefore needed. Data fed to the network will be of shape [HxWxCxBatchSize].
For images, you could use the following approach to convert videos in arrays of images of the appropriate format:
library(imager)
fname <- system.file('extdata/tennis_sif.mpeg',package='imager')
tennis <- load.video(fname, frames = 10, fps = 4)
dim(tennis)
tennis_split <- imsplit(tennis, axis = "z")
img_array <- array(dim = c(352, 240, 3, 10))
for (i in 1:10) {
img_array[,,,i] <- array(tennis_split[[i]], dim = c(352, 240, 3))
}
img_1 <- as.cimg(img_array[,,,1])
dim(img_1)
plot(img_1)
What the above does is to extract 10 frames from the video, at a sample rate of 4 images per second.
Then the imsplit
is used to create a list of images. The loop is used to create an img_array
that is in a compatible format with mxnet.
Note that works for tests but will likely be inefficient for training datasets of decent size.
Converting the frames from video into jpeg and then converting that collection of jpeg files into a RecordIO through im2rec utility would provide an highly efficient image iterator for training on large image dataset.
I want to create the predicative model in line with the image processing, I have a lot of video files(.mov) on Google Drive (concerned with auto driving) I get data from Google Drive as links to the files. My disk and data are available for all Internet users.
test
https://drive.google.com/drive/folders/1JidqB3TfHn0Cky8VBXHjbmHu7s0rGLrO?usp=sharing
train
https://drive.google.com/drive/folders/1WIFQIC23_o1__BPmlRDpnYYwmthH2AP-?usp=sharing
library("googledrive")
X=googledrive::drive_ls(path ="test") Label=googledrive::drive_ls(path ="train")
So example structure of dataframe from google disk (If necessary)
dput()
Now i must resize video data to pixels
require(EBImage)
Dataframe of resized images
rs_df <- data.frame()
Main loop: for each image, resize and set it to greyscale
for(i in 1:nrow(X)) {
Try-catch
result <- tryCatch({
Image (as 1d vector)
}
after i get the list with errors like
<simpleError in labels[i, ]: object of type 'closure' is not subsettable>
next
after here error
Error in names(rs_df) <- c("label", paste("pixel", c(1:776))) : object 'rs_df' not found
Otherwise, I get the same errors concerned with rs_df
I think the problem that it incorrectly loads data from the disk, but I could be wrong, maybe the problem is something else.
How do I properly resize the video in pixels, to continue this analysis. Cause when i run this script, i have many error like
Error in t(test[, -1]) : object 'test' not found
Build model
Clean workspace
rm(list=ls())
Load MXNet
library("downloader") library("influenceR") library("rgexf") require(mxnet)
Load train and test datasets
train <- read.csv("train_28.csv") test <- read.csv("test_28.csv")
Set up train and test datasets
train <- data.matrix(train) train_x <- t(train[, -1]) train_y <- train[, 1] train_array <- train_x dim(train_array) <- c(28, 28, 1, ncol(train_x))
test_x <- t(test[, -1]) test_y <- test[, 1] test_array <- test_x dim(test_array) <- c(28, 28, 1, ncol(test_x))
Set up the symbolic model
data <- mx.symbol.Variable('data')
1st convolutional layer
conv_1 <- mx.symbol.Convolution(data = data, kernel = c(5, 5), num_filter = 20) tanh_1 <- mx.symbol.Activation(data = conv_1, act_type = "tanh") pool_1 <- mx.symbol.Pooling(data = tanh_1, pool_type = "max", kernel = c(2, 2), stride = c(2, 2))
2nd convolutional layer
conv_2 <- mx.symbol.Convolution(data = pool_1, kernel = c(5, 5), num_filter = 50) tanh_2 <- mx.symbol.Activation(data = conv_2, act_type = "tanh") pool_2 <- mx.symbol.Pooling(data=tanh_2, pool_type = "max", kernel = c(2, 2), stride = c(2, 2))
1st fully connected layer
flatten <- mx.symbol.Flatten(data = pool_2) fc_1 <- mx.symbol.FullyConnected(data = flatten, num_hidden = 500) tanh_3 <- mx.symbol.Activation(data = fc_1, act_type = "tanh")
2nd fully connected layer
fc_2 <- mx.symbol.FullyConnected(data = tanh_3, num_hidden = 40)
Output. Softmax output since we'd like to get some probabilities.
NN_model <- mx.symbol.SoftmaxOutput(data = fc_2)
Pre-training set up
-------------------------------------------------------------------------------
Set seed for reproducibility
mx.set.seed(100)
Device used. CPU in my case.
devices <- mx.cpu()
Training
-------------------------------------------------------------------------------
Train the model
model <- mx.model.FeedForward.create(NN_model, X = train_array, y = train_y, ctx = devices, num.round = 480, array.batch.size = 40, learning.rate = 0.01, momentum = 0.9, eval.metric = mx.metric.accuracy, epoch.end.callback = mx.callback.log.train.metric(100))
So how correct get data from google disk and resize it on pixels to create my model? any help is important