Open renyuan001 opened 7 months ago
Hi @renyuan001,
Your subset step is correct, the reason why it throws an error is because, by default, the complete gtex database will be used, and when users add additional cohort, I implemented a rule that the tissue types can not be the same, so that later the tumor_specificity calculation using MLE, which will consider tissue distribution, can function properly. Because of that, since the gtex database has pituitary, and your subsetted database is pituitary, so the assertion error is thrown.
It is very easy to work it around by adding a suffix to the tissue for your subsetted or other control cohort, in your case, I would do:
batch1 = gtex_ctrl_db[:,gtex_ctrl_db.var["tissue"] == "Pituitary"]
batch1.var['tissue'] = [item+'_customized' for item in batch1.var['tissue']]
Then the Assertion Error will go away.
If you'd like to completely turn off the gtex database, you can refer to this issue as well (https://github.com/frankligy/SNAF/issues/37).
If there's more customized usage you'd like to achieve, feel free to reach out!
Best, Frank
Thank you for your explanation. This is a good tool indeed. When we analyisis the tumor RNA-seq.fastq, the candidate neoantigens are filtered by the whole normal tissues (such as gtex_ctrl_db and tcga_ctrl_db) may be better than filtered by our own custom control samples. Because the neoantigen maybe more safe , after the relate TCR-T cells Infused back into the human body?
That's correct, compared to DNA mutation in which since all tissues share the same set of DNA, we only need to filter by germline WGS to confirm it will safe for the patients.
But for gene expression or RNA splicing antigens, we need to make sure the splicing junction is not highly present in normal tissues as RNA is expressed in a more tissue-specific manner.
That's why we compiled a large compendium for normal database to filter out these splicing, and allows users to append as many additional normal cohort as possible to enhance the normal database.
Best, Frank
For example:
What is the problem?