I noticed in the scGPT paper you mentioned that for pre-training, you set a maximum context length of 1,200. However, in the fine-tuning notebook for annotation, the maximum sequence length is 3,000. It seems that the model is able to infer if the context length for the testing set is different from the training set due to absence of positional embedding, is that correct? Does the performance differ if the maximum sequence length is different for training and inference sets, vs if the context length is the same for both data sets?
Hi,
I noticed in the scGPT paper you mentioned that for pre-training, you set a maximum context length of 1,200. However, in the fine-tuning notebook for annotation, the maximum sequence length is 3,000. It seems that the model is able to infer if the context length for the testing set is different from the training set due to absence of positional embedding, is that correct? Does the performance differ if the maximum sequence length is different for training and inference sets, vs if the context length is the same for both data sets?