Inconsistent number of samples

We integrated the q2l into our codebae and it only works with swin_small 224 resolution, and all other backbones(densenet161, resnet50, swin_base) fail with 224 resolution with a variation of following error, in a non deterministic sense at different points like epoch 2, or 5 or 12 sometimes.

Data=0.0004 s | (2515/2515) | 100.00% | Loss=0.0253 [########################################]> pred_scores shape: (90517, 17)
Traceback (most recent call last):
  File "src/main/train.py", line 283, in <module>
    main(None, args)
  File "src/main/train.py", line 154, in main
    cfg, model, train_dataloader, criterion, optimizer, device, phase="train", scaler=scaler)
  File "src/main/helper/epoch.py", line 218, in epoch
    metrics_dict = calculate_metrics(cfg, gt_dict, pred_dict, phase, dataloader)
  File "src/main/helper/postprocess.py", line 79, in calculate_metrics
    gt_label, pred_score, "score", cfg, phase
  File "src/main/helper/postprocess.py", line 40, in evaluate_classif
    return_dict["{}_{}".format(phase, metric)] = callable_metric(gt, preds)
  File "/opt/venv/lib/python3.7/site-packages/sklearn/metrics/_ranking.py", line 572, in roc_auc_score
    sample_weight=sample_weight,
  File "/opt/venv/lib/python3.7/site-packages/sklearn/metrics/_base.py", line 75, in _average_binary_score
    return binary_metric(y_true, y_score, sample_weight=sample_weight)
  File "/opt/venv/lib/python3.7/site-packages/sklearn/metrics/_ranking.py", line 342, in _binary_roc_auc_score
    fpr, tpr, _ = roc_curve(y_true, y_score, sample_weight=sample_weight)
  File "/opt/venv/lib/python3.7/site-packages/sklearn/metrics/_ranking.py", line 963, in roc_curve
    y_true, y_score, pos_label=pos_label, sample_weight=sample_weight
  File "/opt/venv/lib/python3.7/site-packages/sklearn/metrics/_ranking.py", line 733, in _binary_clf_curve
    check_consistent_length(y_true, y_score, sample_weight)
  File "/opt/venv/lib/python3.7/site-packages/sklearn/utils/validation.py", line 334, in check_consistent_length
    % [int(l) for l in lengths]
ValueError: Found input variables with inconsistent numbers of samples: [90517, 90185]

the above error occured with

backbone: swin_base
resolution: 448
dropout: 0
encoder_layer: 0
decoder_layer: 1
hidden_dim: 1024
dim_feedforward: 1024
nheads: 4
loss: asl
neg: 4
pos: 2
num_classes=17

we set the hidden_dim and dim_feedforward or more specifically the query embeddings shape to the decoder dimensions are set to equal the last layer dims of the backbone, for example in the above 1024 of swin_base, query embeddings are of the shape (17, 1024) Did anyone came across this kind of behaviour?

SlongLiu / query2labels

Inconsistent number of samples #36