Closed houruiyan closed 1 year ago
Ah, got it. You want CDS (FASTA), most likely.
Hello Alec, I found the CDS (FASTA)also include some sequence transcript level rather than gene level. So is it reliable if I just average different transcript value for one gene by using the transform table between gene_id and transcript_id?
This is a parameter to the SAMAP class:
It looks like:
names = {'mo': mo_mapping, 'hu': hu_mapping, ... }
mo_mapping
can be a list of tuples mapping each mo
fasta header to its corresponding gene ID in your mo
dataset. The same for all your other species.
[(fasta_id1, gene_id1),(fasta_id2, gene_id2),...]
That should be exactly what you need.
Closing for now, please reopen if you still have questions!
@atarashansky Sorry to bother you! I met the same error! My file gene names start with ENSMFAG
, but the result of BLAST starts with ENSMFAT
. Where and how can I modify them? Change the BLAST file or change the adata.var? I'd appreciate any help getting past this!
Hope to get your answer. Thank you very much!
Check out my comment above yours! That parameter is what you need.
Yes, I understand your answer above, but I would like to ask if there is a faster way to match them one by one? Because I can't think of a good way to match them up(poor coding ability... And I'm not sure if the one-to-one array only contains the gene names of my dataset, or if I need all the gene mappings? Also, I noticed that not all transcript names(fasta_id) have corresponding gene names(gene symbol). sorry to bother you again. I'd appreciate any help getting past this! Thanks!
Hi , Thank you for your reply. Actually I still cannot understand last question. I have the single cell h5ad file. The gene var index looks like this. But I do not know which reference genome I should select to do the blast.
Could you tell me ? Which reference sequence should I select then I do not need to do some combination? http://asia.ensembl.org/info/data/ftp/index.html/
Thank you! https://github.com/atarashansky/SAMap/issues/97 This question did not be solved. Because even if I drop the .1 .2 , they still cannot match. For example, for human , the gene is ENSG , the isoform is ENST
Hope to get your answer. Thank you very much!
Ruiyan