73% of the tokens in adata.var[feature_name] are not in vocab. Please check if using the correct vocab and token_col.

bowang-lab / scGPT

MIT License

1.01k stars 196 forks source link

Hi! Really impressive work! However, when I download the pretraining data, I encountered the following error. In the third step of "Build Training Cell Corpus from Cellxgene Census": Build the scb Files, either setting VOCAB_PATH to ../../scgpt/tokenizer/default_gene_vocab.json or ../../scgpt/tokenizer/default_census_vocab.json will raise the error like:
73% of the tokens in adata.var[feature_name] are not in vocab. Please check if using the correct vocab and token_col.
Where can I get or generate correct vocab?

hi, have u solved this problem?

bowang-lab / scGPT

73% of the tokens in adata.var[feature_name] are not in vocab. Please check if using the correct vocab and token_col. #139