backtime92 / CRAFT-Reimplementation

CRAFT-Pyotorch:Character Region Awareness for Text Detection Reimplementation for Pytorch
464 stars 157 forks source link

Questions on scaling images and GT masks #39

Open ThisIsIsaac opened 5 years ago

ThisIsIsaac commented 5 years ago
  1. In data_loader.py , pull_item function:
        region_scores = self.resizeGt(region_scores)
        affinity_scores = self.resizeGt(affinity_scores)
        confidence_mask = self.resizeGt(confidence_mask)

and the function definition of resizeGt is:

    def resizeGt(self, gtmask):
        return cv2.resize(gtmask, (self.target_size // 2, self.target_size // 2))

Why do you resize the scales to half the target size?


  1. In the same function, you perform element-wise dividsion on region_scores and affiity_scores:
region_scores_torch = torch.from_numpy(region_scores / 255).float()
affinity_scores_torch = torch.from_numpy(affinity_scores / 255).float()

why?


  1. random_scale uses self.target_size as the minimum dimension size and uses 1280 as the maximum. This means the image and char boxes can fit anywhere between 1280 and self.target_size. So what happens if the image is larger than 768? How do you gurantee that it will be 768? You don't seem to rescale the image after random_scale.
backtime92 commented 5 years ago

@ThisIsIsaac Q1: The output map down sample the input to 1/2 size Q2: Region and affinity score is between 0~1 Q3: Random crop the image to 768*768 I think you can read the author's paper, you can get more details.