biomap-research / scFoundation

Apache License 2.0
183 stars 27 forks source link

Question about Gene Identity in SCAD task #10

Closed KatarinaYuan closed 6 months ago

KatarinaYuan commented 7 months ago

Hi, I'm trying to study on the SCAD task but I found that the gene identities seem to be missing based on the provided tsv files such as "Source_exprs_resp_z.Etoposide.tsv". In the tsv file, column names (for genes) are some integers. Are these integers the token idx of scFoundation token vocabulary?

Since the source code of scFoundation is not released, is it possible for you to provide your token vocabulary list (w/ gene ensembl ids for example) for users to apply their own models on SCAD task?

If not possible, is it OK for you to provide new tsv files for SCAD task with interpretable column names as reference for gene identities?

Thank you for your understanding and help!

WhirlFirst commented 7 months ago

Hi, all these source files are provided from the original repository, you can find all related information here https://github.com/CompBioT/SCAD/tree/main/data . As for scFoundation, our vocabulary is right here. https://github.com/biomap-research/scFoundation/blob/1571ef085006aac63fa04fb592236f3198bd99d1/OS_scRNA_gene_index.19264.tsv