dataset and landmark - Githubissues

SJshow commented 4 years ago

may I ask the training set of the Celeba you use is 18000, and the Validation set is 100?

Do you know how to obtain landmark use authors'network in the paper? Isn't the a priori network only able to get the parsing maps, how is the landmark obtained?

cs-giung commented 4 years ago

Note that "CelebA" and "CelebAMask-HQ" are different; the original implementation in the paper used "CelebA" dataset (Liu et al., 2015), not "CelebAMask-HQ" dataset (Lee et al., 2019).
Since the number of images provided by the dataset (CelebA and CelebAMask-HQ) is different, the number of images used for training is also different:
- (original paper) CelebA - 18,000 train images + 100 test images
- (this repository) CelebAMask-HQ - 20,951 train images + 100 test images
The authors gained landmarks and parsing maps (as grount-truth) through these models:
- https://arxiv.org/abs/1711.00253
- https://arxiv.org/abs/1704.05838

SJshow commented 4 years ago

the code about the prior network to calculate the loss function, is it only the use the parsing maps? I would like to ask,how to use the landmark to calculate the loss function? In the FSRNet network, how to get the landmark of the data?

cs-giung commented 4 years ago

Oh, you asked "how authors trained the landmark branch", right?

In the FSRNet paper, you can find that the loss function is given as follows:
```
Loss = MSELoss(ground-truth image, coarse-HR image)
       + MSELoss(ground-truth image, fine-SR image)
       + lambda * MSELoss(ground-truth prior, estimated prior)
```
and your question is equivalent to: "how authors implemented MSELoss(ground-truth prior, estimated prior) part using landmark data?"
You know that this repository showed the implementation of MSELoss(ground-truth prior, estimated prior) part using parsing-map data. In this example, ground-truth prior and estimated prior are visualized as follows (note that "channel-wise concatenation of first row" is ground-truth prior, and "channel-wise concatenation of second row" is estimated prior):

The above estimated prior can be obtained via additional convolutional layer:
```
estimated prior = Conv2D( PriorEstimationNetwork( CoarseSRNetwork( LR_IMG ) ) )
```
and then, after training, the network can predict estimated prior.
The implementation of MSELoss(ground-truth prior, estimated prior) part using landmark data is almost same, ─ add convolutional layer, and minimize MSE between ground-truth prior estimated prior ─ however, the authors did not specify how to create ground-truth prior using landmark information(= set of xy points per image)! I think your question came from here...

To the best of my knowledge, in general, 'Gaussian Heatmap' used to train landmark branch with MSE (I'm not sure, but I think the authors did this, too). It looks like this ─ one blurred point per one channel:

Perhaps it can be implemented as follows (in this example, 194 landmark-points will be used):

def _gaussian_k(self, x0, y0, sigma, width, height):
    x = np.arange(0, width, 1, float)
    y = np.arange(0, height, 1, float)[:, np.newaxis]
    return np.exp(-((x-x0)**2 + (y-y0)**2) / (2*sigma**2)) * 255

def _get_hmaps(self, lmark):
    hmaps = np.zeros((self.size_maps[0], self.size_maps[1], 194), dtype=np.uint8)
    for i, p in enumerate(lmark):
        hmaps[:,:,i] = self._gaussian_k(p[0], p[1], sigma=3,
                                        width=self.size_maps[0],
                                        height=self.size_maps[1])
    return hmaps

SJshow commented 4 years ago

Thank you for your previous answer！I also want to ask you a question, that the result I got during the training process is not very good, the best psnr is only 20, I looked at the data, that is, the tensor value obtained by reading the face parsing map in the celaba parsing map data is always 0, there is no problem to read the address of the picture,There are 11 local parsing maps for each face， but the value of the obtained array is always 0, I would like to ask what is the reason?

cs-giung commented 4 years ago

Please check the directory structure of CELEB_ROOT (note that mask images should be merged for this source code):

CELEB_ROOT/
|
+-- CelebA-HQ-img/
|   +-- { *.jpg }
|
+-- CelebAMask-HQ-mask-anno/
|   +-- merged/
|       +-- { *.png }
|
+-- train_fileList.txt
+-- test_fileList.txt

celeb.py:

(...)
    def _get_pmaps(self, index):
        pmaps = np.zeros((self.size_maps[0], self.size_maps[1], len(self.list_pmaps)), dtype=np.uint8)
        face_idx = self.img_info[index]['image']
        for i, tail in enumerate(self.list_pmaps):
            anno_img = os.path.join(self.root, 'CelebAMask-HQ-mask-anno/merged/', '%05d_' % face_idx + tail + '.png')
            if not os.path.exists(anno_img):
                # keep black
                continue
            anno_img = Image.open(anno_img).convert('L')
            anno_img = anno_img.resize(self.size_maps, Image.BICUBIC)
            pmaps[:,:,i] = np.array(anno_img)
        return pmaps
(...)

SJshow commented 4 years ago

How long did it take you to run this code? It took me 30 hours to run with a 1060Ti computer

cs-giung / FSRNet-pytorch

dataset and landmark #9