daler / gffutils

GFF and GTF file manipulation and interconversion
http://daler.github.io/gffutils
MIT License
282 stars 76 forks source link

bedtools object by pybedtools_integration.to_bedtool: pybedtools.merge automatically substracts start position by one #195

Closed ThuyTien1 closed 2 years ago

ThuyTien1 commented 2 years ago

Hello all, I generated list of regions from gff3 as following:

1       havana  exon    939275  939291  .       +       .       Parent=transcript:ENST00000420190;Name=ENSE00001608769;constitutive=0;ensembl_end_phase=0;ensembl_phase=1;exon_id=ENSE00001608769;rank=7;version=1
1       havana  exon    935772  935793  .       +       .       Parent=transcript:ENST00000437963;Name=ENSE00001631320;constitutive=0;ensembl_end_phase=0;ensembl_phase=2;exon_id=ENSE00001631320;rank=5;version=1
1       ensembl three_prime_UTR 943253  943377  .       +       .       Parent=transcript:ENST00000616016
1       ensembl three_prime_UTR 943698  943808  .       +       .       Parent=transcript:ENST00000616016
1       ensembl three_prime_UTR 943908  944581  .       +       .       Parent=transcript:ENST00000616016
1       ensembl three_prime_UTR 943698  943808  .       +       .       Parent=transcript:ENST00000618323
1       ensembl three_prime_UTR 943908  944581  .       +       .       Parent=transcript:ENST00000618323
1       ensembl three_prime_UTR 942856  943058  .       +       .       Parent=transcript:ENST00000618323
1       ensembl three_prime_UTR 943253  943377  .       +       .       Parent=transcript:ENST00000620200
1       ensembl three_prime_UTR 943698  943808  .       +       .       Parent=transcript:ENST00000620200
1       ensembl three_prime_UTR 943908  944581  .       +       .       Parent=transcript:ENST00000620200

Then I applied gffutils.pybedtools_integration.to_bedtool on this list to generate a bedtools object. After that, I performed pybedtools.merge(d=500, s=True) on this bedtools object and get following result

1       935771  935793  +
1       939274  939291  +
1       942855  944581  +

May I ask why do we have 935771 instead of 935772?

ThuyTien1 commented 2 years ago

Oops, sorry for bothering you guys. The reason is that bed is 0-based while gff is 1-based