estepi / ASpli

Analysis of alternative splicing using RNAseq
7 stars 1 forks source link

Getting gene names from genome TxDb using T2T genome #27

Open katiebegg opened 1 year ago

katiebegg commented 1 year ago

Hi, thanks for making ASpli!

So far, I've generated an integrated signals output with a list of AS events, which looks great. But, instead of gene names I just get the coordinates and locus information (e.g. CHM13_G0011354).

I'm using a T2T assembly and generated the genome file using this code: genomeTxDb <- makeTxDbFromGFF(file = "Z:/Genome_files/chm13v2.0_GENCODEv35_CAT_Liftoff.vep.gff3", format = "gff3", organism = "Homo sapiens")

To add gene names in, in the ASpli documentation I found this code: symbols <- data.frame( row.names = genes( aTxDb ), symbol = paste( 'This is symbol of gene:', genes( aTxDb ) ) ) features <- binGenome( aTxDb, geneSymbols = symbols )

But when I try something like this I get the error:

Error in data.frame(row.names = genes(genomeTxDb), symbol = paste("This is symbol of gene:", : duplicate row.names: chr1:201631648-201632266:-, chr15:18551285-18551711:-, chr15:80271240-80284203:+

Does anyone know how I can get around this so that I can append the gene symbols?

Thanks! Katie