Open ainazHjm opened 5 years ago
I use images with size 328x328 and I get the loss function between prediction at (64:-64, 64:-64) and the gt which is an image of size 200x200. The problem with rotating the image is that the shape increases so I end up with bigger images. I can do two things: 1- pad of the images to the maximum size (both for the input data and the ground truth) 2- only get the loss function on the 200x200 image inside. Approach 1 is better I guess but I don't have much compute power so I end up with very small batch sizes (~3) which is not efficient and takes a long time. I tried 1 but it takes one day just for one epoch ... Also, I was using rotation with angle in (0, 360, 5) but this makes the training ~72 times slower so I'm going to use 36 degree intervals instead (~10 times slower). Approach 2 loses some data but is faster so I'm going to stick with this for now.
The training loss is extremely noisy... It's decreasing with a very small rate but often/periodically overshoots resulting in a sinusoidal behavior. I'm using a very small learning rate (~1e-6 or 1e-7). I'm guessing the problem is with my data because the step size is rather small. I have 99.98% negative samples and only 0.02% positive samples which makes it really hard to train... I'm going to try to add oversampling of positive samples (first just by copying them) but I still don't know why the loss function doesn't get lower than 0.6 which is rather high ... BTW, I'm currently using a weighted BCEwithLogits loss with weight = 1000. The following loss is after a couple of iterations
This one is for another run > training loss after each iteration for 5 epochs:
average training and validation loss at almost each epoch (as you can see it's decreasing on average):
Update on rotation: The idea is to find the direction to look for each pixel and get the features along side that direction. In order to do that I need to find the best rotation for each pixel. Currently, the dataset uses images of size (ws+2pad)x(ws+2pad). For example, 200x200. In this case I need to do 40000 searches and get samples with size 65x65 so that I can run the model on that sample. This makes my training very slow ... The other thing which is pretty easy to do is to only augment the data and back-prop through all of the samples. Let's say we have 20 different rotations in total so, the training process with be slower 20 times which is better than the first approach ... There are other approaches too but coding them is not easy... I'm going to try both of the above approaches and compare the results against each other.