Closed Tariq195 closed 3 years ago
yes, the depth in the number of frames in the video clip that you feed as input to the 3d CNN
Can you tell me what was your input size to the 3d CNN. For training, you have kept 32 batch_size, sample_duration (temporal length = 16), and 256x256 images size, were your input size to 3d CNN "[32, 3, 16, 256, 256]"? Isn't it? Please answer this question.
Can you tell me what was your input size to the 3d CNN. For training, you have kept 32 batch_size, sample_duration (temporal length = 16), and 256x256 images size, were your input size to 3d CNN "[32, 3, 16, 256, 256]"? Isn't it? Please answer this question.
Tariq Hussain Dahri ML Engineer. FAST NUCES.
On Tue, Aug 3, 2021 at 7:55 PM craston @.***> wrote:
yes, the depth in the number of frames in the video clip that you feed as input to the 3d CNN
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/craston/MARS/issues/43#issuecomment-891915602, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASIDNVPY3Y3JFPXDZUPA7QDT277PHANCNFSM5BOTHD5Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .
The 3d CNN works with the videos, MRI, and scan datasets. Can anyone tell me If I have to feed the input (video) to the proposed 3d CNN network, and train it's weights, how can I able to do that? as 3d CNN expect 5 dimensional inputs;
[batch size, channels, depth, height, weight]
how can I extract depth from the videos?
If I have 10 video of 10 different classes. The duration of each video is 6 seconds. I extract 2 frames for each second and it goes around 12 frames for each video.
Size of RGB videos is 112x112 --> Height = 112, Width=112, and Channels=3
If I keep the batch size equals 2
1 video --> 6 seconds --> 12 frames (1sec == 2frames) [each frame (3,112,112)]
10 videos (10 classes) --> 60 seconds --> 120 frames So the 5 dimensions will be something like this; [2, 3, 12, 112, 112]
2 --> Two videos will be processed for each batch size. 3 --> RGB channel 12 --> each video contains 12 frames 112 --> Height of each video 112 --> Width of each video
First, I need to label all 10 videos [3, 12, 112, 112] --> [channels, frames (depth), height, width], then I am feeding it to the Data Loader (Pytorch) to make it to batch size [2, 3, 12, 112, 112]
I use data loader in Pytorch, I am keeping its batch size equals 2 as I am processing 2 videos each time during the training, this way my 10 videos will be trained for 5 times.
Am I right? or can you suggest any other method to do this?