Closed BitterWood closed 3 weeks ago
Could you please confirm if your BED file is a 6-column file delimited by tabs (\t)?
Could you please confirm if your BED file is a 6-column file delimited by tabs (\t)?
Thanks for your quick reply. I check this by
head -n 6 mm10.erv.bed | awk -F'\t' '{if(NF==6) print "Line", NR, "has 6 columns"; else print "Line", NR, "does not have 6 columns"}'
And I receive the follows:
Line 1 has 6 columns Line 2 has 6 columns Line 3 has 6 columns Line 4 has 6 columns Line 5 has 6 columns Line 6 has 6 columns
As the log shows, I suppose my BED file is a 6-column file delimited by tabs (\t).
Could you please confirm if your BED file is a 6-column file delimited by tabs (\t)?
Hello @jphe , I re-check my BED and GTF files. As I use scTE_build with my BED file and Gene.gtf provided by scTE, the command works successfully, but as I run the command with my GTF file and TE.bed provided by scTE, I meet the error. So the problem should be the GTF file. Then I run scTE_build with my BED file and the GTF file downloaded separately from
ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M21/gencode.vM21.annotation.gtf.gz
or
https://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/genes/mm10.ncbiRefSeq.gtf.gz
For the result, the GENCODE GTF file works successfully as the UCSC one fails. I suppose there should be some differences when extracting information from these two GTF files, but I fail to figure them out. Could you please offer some help?
The first six lines of these two GTF files are as follows:
GENCODE GTF:
description: evidence-based annotation of the mouse genome (GRCm38), version M21 (Ensembl 96)
provider: GENCODE
contact: gencode-help@ebi.ac.uk
format: gtf
date: 2019-03-27
chr1 HAVANA gene 3073253 3074322 . + . gene_id "ENSMUSG00000102693.1"; gene_type "TEC"; gene_name "4933401J01Rik"; level 2; havana_gene "OTTMUSG00000049935.1"; chr1 HAVANA transcript 3073253 3074322 . + . gene_id "ENSMUSG00000102693.1"; transcript_id "ENSMUST00000193812.1"; gene_type "TEC"; gene_name "4933401J01Rik"; transcript_type "TEC"; transcript_name "4933401J01Rik-201"; level 2; transcript_support_level "NA"; tag "basic"; havana_gene "OTTMUSG00000049935.1"; havana_transcript "OTTMUST00000127109.1"; chr1 HAVANA exon 3073253 3074322 . + . gene_id "ENSMUSG00000102693.1"; transcript_id "ENSMUST00000193812.1"; gene_type "TEC"; gene_name "4933401J01Rik"; transcript_type "TEC"; transcript_name "4933401J01Rik-201"; exon_number 1; exon_id "ENSMUSE00001343744.1"; level 2; transcript_support_level "NA"; tag "basic"; havana_gene "OTTMUSG00000049935.1"; havana_transcript "OTTMUST00000127109.1"; chr1 ENSEMBL gene 3102016 3102125 . + . gene_id "ENSMUSG00000064842.1"; gene_type "snRNA"; gene_name "Gm26206"; level 3; chr1 ENSEMBL transcript 3102016 3102125 . + . gene_id "ENSMUSG00000064842.1"; transcript_id "ENSMUST00000082908.1"; gene_type "snRNA"; gene_name "Gm26206"; transcript_type "snRNA"; transcript_name "Gm26206-201"; level 3; transcript_support_level "NA"; tag "basic";
UCSC GTF:
chrM ncbiRefSeq.2021-04-23 transcript 15356 15422 . - . gene_id "TrnP"; transcript_id "rna-TrnP"; gene_name "TrnP"; chrM ncbiRefSeq.2021-04-23 exon 15356 15422 . - . gene_id "TrnP"; transcript_id "rna-TrnP"; exon_number "1"; exon_id "rna-TrnP.1"; gene_name "TrnP"; chrM ncbiRefSeq.2021-04-23 transcript 15289 15355 . + . gene_id "TrnT"; transcript_id "rna-TrnT"; gene_name "TrnT"; chrM ncbiRefSeq.2021-04-23 exon 15289 15355 . + . gene_id "TrnT"; transcript_id "rna-TrnT"; exon_number "1"; exon_id "rna-TrnT.1"; gene_name "TrnT"; chrM ncbiRefSeq.2021-04-23 transcript 14145 15288 . + . gene_id "CYTB"; transcript_id "NP_904340.1"; gene_name "CYTB"; chrM ncbiRefSeq.2021-04-23 exon 14145 15288 . + . gene_id "CYTB"; transcript_id "NP_904340.1"; exon_number "1"; exon_id "NP_904340.1.1"; gene_name "CYTB"; chrM ncbiRefSeq.2021-04-23 CDS 14145 15288 . + 0 gene_id "CYTB"; transcript_id "NP_904340.1"; exon_number "1"; exon_id "NP_904340.1.1"; gene_name "CYTB"; chrM ncbiRefSeq.2021-04-23 start_codon 14145 14147 . + 0 gene_id "CYTB"; transcript_id "NP_904340.1"; exon_number "1"; exon_id "NP_904340.1.1"; gene_name "CYTB"; chrM ncbiRefSeq.2021-04-23 transcript 14071 14139 . - . gene_id "TrnE"; transcript_id "rna-TrnE"; gene_name "TrnE"; chrM ncbiRefSeq.2021-04-23 exon 14071 14139 . - . gene_id "TrnE"; transcript_id "rna-TrnE"; exon_number "1"; exon_id "rna-TrnE.1"; gene_name "TrnE";
Desparating for your help. Many thanks again.
I'm not familiar with UCSC gtf, the simplest way is to convert the UCSC gtf to GENCODE style.
I'm not familiar with UCSC gtf, the simplest way is to convert the UCSC gtf to GENCODE style.
To be honest, until now my tasks are all based on the UCSC GTF, and my TE BED is based on the UCSC annotation as well. I've finished my task this time with GENCODE GTF, and I'm going to check whether it's OK when making further analysis.
Thank you for your time. Good luck in your work!
Thanks for the nice tool, and I have successfully finished several tasks with scTE. This time however, as I want to use my customs reference by the command
scTE_build -te ../4_anno/mm10.erv.bed -gene ../4_anno/mm10.chr.gtf -o custome
I met the error as:I find this issue similar with #79 , but it remains open. As some other people met the same problem, I add this issue. Sorry for the repetition of the same question.
The first six lines of my gtf and bed files are as follows: gtf:
bed:
Many thanks for your help!