craston / MARS

MARS: Motion-Augmented RGB Stream for Action Recognition
MIT License
161 stars 45 forks source link

How can I give input to 3d CNN? #43

Closed Tariq195 closed 3 years ago

Tariq195 commented 3 years ago

The 3d CNN works with the videos, MRI, and scan datasets. Can anyone tell me If I have to feed the input (video) to the proposed 3d CNN network, and train it's weights, how can I able to do that? as 3d CNN expect 5 dimensional inputs;

[batch size, channels, depth, height, weight]

how can I extract depth from the videos?

If I have 10 video of 10 different classes. The duration of each video is 6 seconds. I extract 2 frames for each second and it goes around 12 frames for each video.

Size of RGB videos is 112x112 --> Height = 112, Width=112, and Channels=3

If I keep the batch size equals 2

1 video --> 6 seconds --> 12 frames (1sec == 2frames) [each frame (3,112,112)]

10 videos (10 classes) --> 60 seconds --> 120 frames So the 5 dimensions will be something like this; [2, 3, 12, 112, 112]

2 --> Two videos will be processed for each batch size. 3 --> RGB channel 12 --> each video contains 12 frames 112 --> Height of each video 112 --> Width of each video

First, I need to label all 10 videos [3, 12, 112, 112] --> [channels, frames (depth), height, width], then I am feeding it to the Data Loader (Pytorch) to make it to batch size [2, 3, 12, 112, 112]

I use data loader in Pytorch, I am keeping its batch size equals 2 as I am processing 2 videos each time during the training, this way my 10 videos will be trained for 5 times.

Am I right? or can you suggest any other method to do this?

craston commented 3 years ago

yes, the depth in the number of frames in the video clip that you feed as input to the 3d CNN

Tariq195 commented 3 years ago

Can you tell me what was your input size to the 3d CNN. For training, you have kept 32 batch_size, sample_duration (temporal length = 16), and 256x256 images size, were your input size to 3d CNN "[32, 3, 16, 256, 256]"? Isn't it? Please answer this question.

Tariq195 commented 3 years ago

Can you tell me what was your input size to the 3d CNN. For training, you have kept 32 batch_size, sample_duration (temporal length = 16), and 256x256 images size, were your input size to 3d CNN "[32, 3, 16, 256, 256]"? Isn't it? Please answer this question.

Tariq Hussain Dahri ML Engineer. FAST NUCES.

On Tue, Aug 3, 2021 at 7:55 PM craston @.***> wrote:

yes, the depth in the number of frames in the video clip that you feed as input to the 3d CNN

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/craston/MARS/issues/43#issuecomment-891915602, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASIDNVPY3Y3JFPXDZUPA7QDT277PHANCNFSM5BOTHD5Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .