Open Shicheng-Guo opened 3 years ago
The problem is chr and regions have different length, how to solve the problem?
# Using GTF files to extract information about genes, transcripts and related features http://ceesu.github.io/gtf/ BiocManager::install("rtracklayer") curl ftp://ftp.ensembl.org/pub/release-94/gtf/mus_musculus/Mus_musculus.GRCm38.94.gtf.gz -o Mus_musculus.GRCm38.94.gtf.gz curl ftp://ftp.ensembl.org/pub/release-94/gtf/homo_sapiens/Homo_sapiens.GRCh38.94.gtf.gz -o Homo_sapiens.GRCh38.94.gtf.gz curl -O ftp://ftp.ensembl.org/pub/release-99/variation/indexed_vep_cache/homo_sapiens_vep_99_GRCh38.tar.gz tar xzf homo_sapiens_vep_99_GRCh38.tar.gz curl -O ftp://ftp.ensembl.org/pub/release-99/variation/indexed_vep_cache/homo_sapiens_vep_99_GRCh37.tar.gz tar xzf homo_sapiens_vep_99_GRCh38.tar.gz gtf <- rtracklayer::import('Homo_sapiens.GRCh38.103.gtf') input <- gtf[gtf$type == "gene",] seqnames<-gtf$seqnames ranges<-gtf$ranges gene_name<-gtf$gene_name gene_id<-gtf$gene_id transcript_id<-gtf$transcript_id strand<-gtf$strand length(seqnames) length(ranges) length(gene_name) length(gene_id) length(transcript_id) length(strand) input<-data.frame(gene_name,gene_id,transcript_id) dim(input) input<-na.omit(input) dim(input) write.table(input,file="human.symbol.ensg.enst.txt",sep="\t",quote=F,row.names=F,col.names=T)
The problem is chr and regions have different length, how to solve the problem?