bethgelab / siamese-mask-rcnn

Siamese Mask R-CNN model for one-shot instance segmentation
Other
346 stars 60 forks source link

Problem with custom dataset #28

Open TuanNguyenSKKU opened 4 years ago

TuanNguyenSKKU commented 4 years ago

Hi everyone, I am trying to train the siamese model with a custom dataset (comprises three classes) and I used the trained weight file (mask_rcnn_coco.h5). The dataset_train and the dataset_val are saved as JSON format like the Mask R-CNN repository. But I received the error about the image shapes as below. How I can reshape the image size to fit with this model? Thank you!

This is the code of the training part.

`# Training if name == 'main': dataset_dir = os.path.join(ROOT_DIR, "shapes")

dataset_train = shapesDataset()
dataset_train.load_shapes(dataset_dir, "train")
dataset_train.prepare()

# Validation dataset
dataset_val = shapesDataset()
dataset_val.load_shapes(dataset_dir, "val")
dataset_val.prepare()

config = shapesConfig()
config.display()

# Create model object in inference mode.
model = siamese_model.SiameseMaskRCNN(mode="training", model_dir=MODEL_DIR, config=config)

# Select weights file to load
init_with = "coco"
if init_with == "coco":
    model.load_weights(COCO_WEIGHTS_PATH, by_name=True,
                       exclude=["mrcnn_class_logits", "mrcnn_bbox_fc",
                                "mrcnn_bbox", "mrcnn_mask"])
elif init_with == "last":
    model.load_weights(model.find_last(), by_name=True)
elif init_with == "imagenet":
    model.load_weights(model.get_imagenet_weights(), by_name=True)

start_train = time.time()
model.train(dataset_train, dataset_val,
            learning_rate=config.LEARNING_RATE,
            epochs=30,
            layers='heads', )

history = model.keras_model.history.history
epochs = range(1, len(next(iter(history.values()))) + 1)

plt.figure()
plt.plot(epochs, history["loss"], label="Train loss")
plt.plot(epochs, history["val_loss"], label="Valid loss")
plt.title('Train loss and Valid loss', fontsize=12, fontweight='bold')
plt.xlabel('Number of Epoch', fontsize=10)
plt.ylabel('Loss value', fontsize=10)
plt.legend(fontsize=10)
plt.savefig('loss.png')
plt.show()

best_epoch = np.argmin(history["val_loss"])
print("Best Epoch:", best_epoch + 1, history["val_loss"][best_epoch])

end_train = time.time()
minutes = round((end_train - start_train) / 60, 2)
print(f'Training took {minutes} minutes')

` The error:

ValueError: Dimension 2 in both shapes must be equal, but are 384 and 256. Shapes are [3,3,384,512] and [3,3,256,512]. for 'Assign' (op: 'Assign') with input shapes: [3,3,384,512], [3,3,256,512].

michaelisc commented 4 years ago

Which config file did you use? It looks like it could simply be a mismatch between the small and large model.

TuanNguyenSKKU commented 4 years ago

Thank you for your response. I have used the default config.py from the Siamese model and used it in the shapesConfig class as below.

`class shapesConfig(siamese_config.Config):

NAME = "shapes"  # Override in sub-classes
EXPERIMENT = 'example'
# NUMBER OF GPUs to use. For CPU training, use 1
# GPU_COUNT = 2
IMAGES_PER_GPU = 1
STEPS_PER_EPOCH = 100
NUM_CLASSES = 1 + 3  # For background + my_classes
DETECTION_MIN_CONFIDENCE = 0.9
MASK_SHAPE = [56, 56]
USE_MINI_MASK = False

`

config.zip

michaelisc commented 4 years ago

Have you tried using 2 classes (1 + 1)? Because this model is Siamese and uses an example of the class instead of class labels there is just one foreground class that covers the others implicitly.

TuanNguyenSKKU commented 4 years ago

I have never tried it before. Do you have any solutions for class labels? Because I think someone can also implement this repository with many class labels. Thank you.

michaelisc commented 4 years ago

Yes that is correct but it would defy the idea of the task and model. If you want to use multiple class labels you should probably use a standard object detection model from a toolbox like mmdetection or detectron2.

TuanNguyenSKKU commented 4 years ago

Thank you for your suggestions.

ghost commented 3 years ago

sir in Siamese how can we only use two image and pretrain model to detect the output ?.. sir can you make a page to explain every part of code ... i am facing problem understanding it .. i am a beginner to this field

F2Wang commented 3 years ago

I am slight confused by this thread of discussion, I understand that the network can only output binary labels, but should it be trained that way too (Only bg and instance)? If that's the case if you provide a reference image of people, shouldn't it consider all coco classes it has been trained on as an instance, given that people, apples, bicycles were all trained as the same class?