Masking code bug - Githubissues

czlwang / BrainBERT

[ICLR 2023] Code for BrainBERT

40 stars 12 forks source link

Masking code bug #3

Closed Utkarsh4430 closed 1 year ago

Utkarsh4430 commented 1 year ago

In this function: https://github.com/czlwang/BrainBERT/blob/master/util/mask_utils.py#L28 Shouldn't lines 48-49 come after the for loop on line 52?

Your masked_labels and masked input don't align

czlwang commented 1 year ago

Yes it is possible that mask_label is 1 in some locations, and masked_data is not actually masked. This follows the precedent set by BERT:

we are creating a mismatch between pre-training and fine-tuning, since the [MASK] token does not appear during fine-tuning. To mitigate this, we do not always replace “masked” words with the actual [MASK] token.

A better name for masked_labels would be locations_where_loss_is_computed.