details of readTranscriptFeatures

smcnew commented 4 years ago

Hi: I'm hoping someone can provide some more details about the readTranscriptFeatures that I couldn't figure out based on the documentation and searching other threads.

1) Am I correct in assuming that this function estimates where promoters are likely to be based on exon coordinates? 2) Are the up.flank and down.flank arguments in kb or b? (default = 1000) 3) Does down.flank extend into the exon? i.e. does this mean that the last 1000 kb of the promoter and the first 1000kb of the exon will overlap?

Most importantly, I'm not sure I understand how this function identifies TSS (and promoters). I'm using two different bed files (for different species). One is a zebra finch; I downloaded genome annotation from NCBI. The other is a non-model species that collaborators have assembled the genome for and created an annotation file. In both cases I converted the gtf to bed files for this function.

The 4th column in my bed files (i.e. the feature name) does not include any "TSS", "exon" or "intron". For example here's the start of one:

Can anyone provide more details about how genomation uses the information in the bed file to estimate coordinates?

Thanks in advance -

Sabrina

katwre commented 4 years ago

Hi @smcnew,

readTranscriptFeatures estimates promoters based on TSS, by default from 1kb upstream to 1kb downstream from TSS
bases, not kilobases
yes, they might overlap

readTranscriptFeatures reads a file in bed12 format (a bed file with minimum 12 columns), it uses 11th and 12th column that contain exon size and start positions to calculate coordinates of exons and introns (https://genome.ucsc.edu/FAQ/FAQformat.html#format1), and it uses 2nd column to get coordinates of a TSS if it's on + strand and 3rd column if it's on - strand.

Cheers, KAsia

katwre commented 4 years ago

@smcnew please reopen this issue if you have more questions or something is still not clear

BIMSBbioinfo / genomation

details of readTranscriptFeatures #194