bowang-lab / scGPT

https://scgpt.readthedocs.io/en/latest/
MIT License
1.05k stars 207 forks source link

Max sequence length differs in training and test sets #224

Open tjeng opened 4 months ago

tjeng commented 4 months ago

Hi,

I noticed in the scGPT paper you mentioned that for pre-training, you set a maximum context length of 1,200. However, in the fine-tuning notebook for annotation, the maximum sequence length is 3,000. It seems that the model is able to infer if the context length for the testing set is different from the training set due to absence of positional embedding, is that correct? Does the performance differ if the maximum sequence length is different for training and inference sets, vs if the context length is the same for both data sets?