Closed quarkaffi closed 3 years ago
Hi, you are right, usually issues about bamtolist are due to annotation mismatches. To be sure everything is ok, try to run on one of your BAM files:
sample_dt <- as.data.table(GenomicAlignments::readGAlignments("bam_file.bam"))
and then
length(unique(as.character(sample_dt$transcript)))
length(annotation_dt$transcript)
length(intersect(unique(as.character(sample_dt$transcript)), annotation_dt$transcript))
where annotation_dt is the output of _createannotation. You should be able to check if and how many transcript names in annotation match the transcript names in the BAM file.
Tip: when running a command like
reads_list<- bamtolist(transcript_align = "FALSE", bamfolder = "bamfiles", annotation = "annotation.txt")
logical strings such as FALSE don't need to be put within quotation marks. And be sure bamfiles is the path to the folder where BAM files are stored.
Let me know what is going on.
Best Fabio
Hi Fabio,
thanks for your quick reply.
So i loaded the Bam into sample_dt, looking like: seqnames strand cigar qwidth start end width njunc 1: Y48G9A.3.1 + 28M 28 2351 2378 28 0 2: W06A7.3g.3 + 25M 25 1485 1509 25 0 3: W06A7.3g.1 + 25M 25 1483 1507 25 0 4: W06A7.3a.1 + 25M 25 791 815 25 0 5: W06A7.3f.1 + 25M 25 10011 10035 25 0 6: W06A7.3c.1 + 25M 25 9999 10023 25 0
length(unique(as.character(sample_dt$transcript))) prompts [1] 0
as the transcripts are called "seqnames" instead of "transcripts" (why is that?)
changing it to seqnames:
length(unique(as.character(sample_dt$seqnames))) [1] 33173
The annotation file with length(annotation_dt$transcript)
length(annotation$transcript) [1] 61451
equally:
length(intersect(unique(as.character(sample_dt$transcript)), annotation$transcript)) [1] 0
length(intersect(unique(as.character(sample_dt$seqnames)), annotation$transcript)) [1] 33173
Best, Martin
as the transcripts are called "seqnames" instead of "transcripts" (why is that?)
Sorry for the oversight. The names of the columns are assigned by the function GenomicAlignments::readGAlignments and they decided to use "standard" names, independent from the experimental design and analyses of interest.
BTW since all seqnames are found in the annotation table, it's not an issue of matching names. Additional question to try to understand what's the problem:
reads_list <- bamtolist(transcript_align = "FALSE", bamfolder = "bamfiles", annotation = "annotation.txt")
the one you actually used or are there any options not reported here?Hi Fabio,
bamfiles if the folder where my (4) bamfiles are located, which a folder in the working directory.
annotation.txt is the output saved from create_annotation, that i load in with as.data.table before feeding it into the script looking like
head(annotation.txt) transcript l_tr l_utr5 l_cds l_utr3 1: 2L52.1a.1 1357 19 1284 54 2: 2L52.1b.1 789 126 663 0 3: 2L52.2 151 0 0 0 4: 2RSSE.1a.1 1301 269 1032 0 5: 2RSSE.1b.1 1706 269 1437 0 6: 2RSSE.1c.1 1046 269 777 0
Best, Martin
It worked, you were right. Reading the file in from a txt caused the issues.
Thank you very much.
Glad to read everything is fine.
I'm going to close this issue, but feel free to contact me again at any moment.
Best Fabio
Hi,
I experienced an error during the bamtolist function and I don't understand what is causing it. I saw in other issues before that it can be caused by different annotations in the bam alignment and in the annotation file created with the create_annotation. I am using the ref genome and gtf from Ensemble for C. elegans (WBcel235).
This is how my output of create_annotation looks like transcript l_tr l_utr5 l_cds l_utr3 1: 2L52.1a.1 1357 19 1284 54 2: 2L52.1b.1 789 126 663 0 3: 2L52.2 151 0 0 0 4: 2RSSE.1a.1 1301 269 1032 0 5: 2RSSE.1b.1 1706 269 1437 0 6: 2RSSE.1c.1 1046 269 777 0
and this is my STAR alignment output with --quantMode TranscriptomeSAM
NB500982:360:HW5NVBGXH:1:11110:2371:2070_CTTTGGGA 256 F56F3.5.1 662 3 42M 0 0 TAAGGTCAAGATCATCAAGAGACAAAAGGTTGATCTTGGACG AAAEE/EEE6EEEEEAEEEEEE//EEEE/AE/EE/E/E/EE/ NH:i:2 HI:i:2 NB500982:360:HW5NVBGXH:1:11110:6828:2079_AAACATGA 0 C23H4.6b.1 3095 3 20M 0 0 ACGTTTTCATGGATATGATG AAAEEEEEEEEEEEEEEEEENH:i:2 HI:i:1 NB500982:360:HW5NVBGXH:1:11110:6828:2079_AAACATGA 256 C23H4.6a.1 3245 3 20M 0 0 ACGTTTTCATGGATATGATG AAAEEEEEEEEEEEEEEEEENH:i:2 HI:i:2 NB500982:360:HW5NVBGXH:1:11110:18755:2079_CTTCGCTA 0 F10B5.1.1 619 3 28M 0 0 CTCACCAAGTGGATTGAGAACCCAATCT AAAEEEEEEEEEEEEEEEEEEEEEEEEE NH:i:2 HI:i:1
The command i use reads_list <- bamtolist(transcript_align = "FALSE", bamfolder = "bamfiles", annotation = "annotation.txt")
The error I get Good! Number of indel below indel_threshold for all reads. No reads removed. Error: $ operator is invalid for atomic vectors
Thanks in advance