NBISweden / AGAT

Another Gtf/Gff Analysis Toolkit
GNU General Public License v3.0
467 stars 56 forks source link

Can agat convert a gff to bed format and retain thickStart and thickEnd of non coding RNAs #301

Closed loukesio closed 2 years ago

loukesio commented 2 years ago

I am using agat atm to convert a gff file to bed using the following command.

agat_convert_sp_gff2bed.pl -gff Arabidopsis_thaliana_TAIR10.gff3  -o test.bed

When I convert the gff to bed I find NA values (aka .) in the thickStart and thickEnd column of the bed file for the non-coding RNAs. Is there a way to convert gff to bed and acquire a thickStart and thickEnd values for these elements?

Thank you for your time
In the link, I post the gff3 file that I am working on https://drive.google.com/drive/folders/1-wmbc9gKtbXFJ95E0n41WgPL-G313SNe?usp=sharing

Juke34 commented 2 years ago

thickStart and thickEnd are usulally used to define coding part (CDS), which does not exist for non-coding genes. What do you think would be a correct value in this case?

loukesio commented 2 years ago

Dear Juke,

Thank you so much for the prompt reply. I highly appreciate it. In the following picture I would fill thickStart and thickEnd with chromStart and chromEnd? What do you think?

Screenshot 2022-11-14 at 13 55 42
Juke34 commented 2 years ago

Then an awk command should do the trick e.g.:

awk 'BEGIN{OFS="\t"}{if($7 == "."){$7=$2; $8=$3} print $0 }' file.bed