Closed xwanaf closed 10 months ago
Hi! Really impressive work! However, when I download the pretraining data, I encountered the following error. In the third step of "Build Training Cell Corpus from Cellxgene Census": Build the
scb
Files, either settingVOCAB_PATH
to../../scgpt/tokenizer/default_gene_vocab.json
or../../scgpt/tokenizer/default_census_vocab.json
will raise the error like:73% of the tokens in adata.var[feature_name] are not in vocab. Please check if using the correct vocab and token_col.
Where can I get or generate correct vocab?
hi, have u solved this problem?
Hi! Really impressive work! However, when I download the pretraining data, I encountered the following error. In the third step of "Build Training Cell Corpus from Cellxgene Census": Build the
scb
Files, either settingVOCAB_PATH
to../../scgpt/tokenizer/default_gene_vocab.json
or../../scgpt/tokenizer/default_census_vocab.json
will raise the error like:Where can I get or generate correct vocab?