JackieHanLab / TOSICA

Transformer for One-Stop Interpretable Cell-type Annotation
MIT License
121 stars 23 forks source link

What will happen if the number or order of var_names is different? #5

Closed hxpGit512 closed 1 year ago

hxpGit512 commented 1 year ago

image If this requirement is mandatory, does it mean that both ref and predict datasets have to be aligned var_names first before each prediction of a new dataset?

JackieHanLab commented 1 year ago

Thank you for your interest in TOSICA. In version 1.0, it is necessary for TOSICA to have identical var_names for both reference and query in order to fit the masked embedding built during the training step. However, this does not mean that you need to constantly align your reference and new dataset. Instead, you can use the var_names in your reference to select and arrange the query. If certain var_names are not present in the query, you can simply treat it as a dropout and fill it with 0 as input. A dropout rate of less than 20% is acceptable, otherwise realigning and retraining become necessary.

hxpGit512 commented 1 year ago

已收件

hxpGit512 commented 1 year ago

已收件