[Implementation] Sentence order prediction (SOP) label for a single-chunk-document in create_pretraining_data.py

Thanks for the great work.

I have a question about the gap between the paper's report and the released code for the sentence order prediction (SOP) task. Actually, the code for SOP seems to contain NSP, I think.

Section 3.1 in the ALBERT paper says that SOP can solve NSP (next sentence prediction) to a reasonable degree (as in Table 5, Section 4.6). Whereas the paper says SOP uses only consecutive sentences, the released code contains a random document selection procedure.

The problem I think is sentence_order_label in create_pretraining_data.py for a document with a single chunk. In line 315-7, this code randomly selects the other document for handling len(current_chunk) == 1 and set is_random_next = True (which means sentence_order_label = 1). This label is not for a truely reveresed order of consecutive sentences (as in SOP) but for NSP.

Is there any misunderstanding in my question? If not, is there any difference in the version of the released code with the paper?

Or, is this the best practice for handling single-chunk-document?

Thanks.

google-research / albert

[Implementation] Sentence order prediction (SOP) label for a single-chunk-document in create_pretraining_data.py #234