bowang-lab / scGPT

https://scgpt.readthedocs.io/en/latest/
MIT License
1.01k stars 196 forks source link

73% of the tokens in adata.var[feature_name] are not in vocab. Please check if using the correct vocab and token_col. #139

Closed xwanaf closed 10 months ago

xwanaf commented 10 months ago

Hi! Really impressive work! However, when I download the pretraining data, I encountered the following error. In the third step of "Build Training Cell Corpus from Cellxgene Census": Build the scb Files, either setting VOCAB_PATH to ../../scgpt/tokenizer/default_gene_vocab.json or ../../scgpt/tokenizer/default_census_vocab.json will raise the error like:

73% of the tokens in adata.var[feature_name] are not in vocab. Please check if using the correct vocab and token_col.

Where can I get or generate correct vocab?

GlancerZ commented 5 months ago

Hi! Really impressive work! However, when I download the pretraining data, I encountered the following error. In the third step of "Build Training Cell Corpus from Cellxgene Census": Build the scb Files, either setting VOCAB_PATH to ../../scgpt/tokenizer/default_gene_vocab.json or ../../scgpt/tokenizer/default_census_vocab.json will raise the error like:

73% of the tokens in adata.var[feature_name] are not in vocab. Please check if using the correct vocab and token_col.

Where can I get or generate correct vocab?

hi, have u solved this problem?