Acribbs / tRNAnalysis

tRNA analysis workflow
MIT License
3 stars 1 forks source link

find fragment sequences from BED in QC_alignment.Rmd #15

Open slobentanzer opened 5 years ago

slobentanzer commented 5 years ago

hi adam,

i finally had time to look at the results, found another bug (will make separate issue), but got it to work in the end. i am in the QC_alignment.Rmd, and i was wondering what the BED file columns were (they are custom, right?).

what i am basically looking for is the easiest way to get the sequence of each fragment using the coordinates from each BED. so is chromStart = V2, chromEnd = V3? count = V7? which reference should i use, and where do i find it in the folder?

thanks!

sebastian

Acribbs commented 5 years ago

Hi Sebastian,

Sorry for late response.

They are essentially generated using bedtools coverage with each bam file being -b and the bed file of regions for each fragment being -a option.

The output is V2, start, V3, end, V4 is the annotation, V5 the score (Can ignore this), V6 the strand and V7 is the number of reads that overlap each bed region. The output of this analysis is found here: tRNA-mapping.dir/{name_of_file}_fragment_coverage.bed.

BW, Adam

slobentanzer commented 5 years ago

Hi Adam, I got around to checking for the sequences, I used the hg38_cluster.fa to look up the sequences for each fragment in the bed file, is that correct? Like so: as.character(subseq({fasta-file}[{bed}$Chr[i]], bed$Start[i], bed$End[i]))

Is there a way to check if the sequences generated this way for each fragment are actually correct?

And: what does a negative strand mean in this context? Is it the reverse complement? (EDIT: The negative strand fragments do not always have the same count as the positives, it just happens often.)

Kind regards, Sebastian

Acribbs commented 5 years ago

Hi Sebastian,

I will look into this at the same time as your other issue.

Thanks as always, Best wishes, Adam