Open fgvieira opened 11 years ago
:+1: In particular, the GFF version 3 header ##gff-version 3
must be maintained.
Is the -header
option not working for you? The example below is from the bedtools2 repository (please file issues there):
curl -s ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_19/gencode.v19.annotation.gtf.gz | gzcat | head -20 > test.gtf
bedtools --version
bedtools v2.19.1
bedtools intersect -header -a test.gtf -b test.gtf | head
##description: evidence-based annotation of the human genome (GRCh37), version 19 (Ensembl 74)
##provider: GENCODE
##contact: gencode@sanger.ac.uk
##format: gtf
##date: 2013-12-05
chr1 HAVANA gene 11869 14412 . + . gene_id "ENSG00000223972.4"; transcript_id "ENSG00000223972.4"; gene_type "pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "pseudogene"; transcript_status "KNOWN"; transcript_name "DDX11L1"; level 2; havana_gene "OTTHUMG00000000961.2";
chr1 HAVANA gene 11869 14409 . + . gene_id "ENSG00000223972.4"; transcript_id "ENSG00000223972.4"; gene_type "pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "pseudogene"; transcript_status "KNOWN"; transcript_name "DDX11L1"; level 2; havana_gene "OTTHUMG00000000961.2";
chr1 HAVANA gene 11869 12227 . + . gene_id "ENSG00000223972.4"; transcript_id "ENSG00000223972.4"; gene_type "pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "pseudogene"; transcript_status "KNOWN"; transcript_name "DDX11L1"; level 2; havana_gene "OTTHUMG00000000961.2";
chr1 HAVANA gene 12613 12721 . + . gene_id "ENSG00000223972.4"; transcript_id "ENSG00000223972.4"; gene_type "pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "pseudogene"; transcript_status "KNOWN"; transcript_name "DDX11L1"; level 2; havana_gene "OTTHUMG00000000961.2";
chr1 HAVANA gene 13221 14409 . + . gene_id "ENSG00000223972.4"; transcript_id "ENSG00000223972.4"; gene_type "pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "pseudogene"; transcript_status "KNOWN"; transcript_name "DDX11L1"; level 2; havana_gene "OTTHUMG00000000961.2";
Ah, I see now that you are referring to the fact that some of the tools don't support this functionality. In bedtools2, we are slowly working through standardizing the API for all of the tools. Once done, the result will be that all of the tools (when relevant) will support the -header
option.
bedtools sort -header
works perfect! Thanks, Aaron. I was reading this documentation which doesn't show the -header
option.
I expected -header
to be the default behaviour. Perhaps instead a -noheader
option?
I see your point Shaun. The problem with this, however, is that such a change could impact many existing pipelines that are crafted around the assumption that headers will not be emitted by default. I think once we standardize the API this would be something worth revisiting with users on the mailing list to seek feedback about the impact.
Right now they have to start with '#' (comment) and are generally discarded from output (eg. subtractBed).
It would be nice if headers could be properly handled and printed to the output. Maybe add an option that would not parse the first line and just print it accordingly.