Closed jfreimer closed 8 years ago
Ah yes - I created an issue for this previously - #122 but was unable to replicate it when I came to fixing the problem. Do you have a log file that shows this effect that I can use for testing?
Bowtie logs are generally really terrible though - they're incredibly minimal and really difficult to find, as well as being embedded everywhere. The module already checks for the word 'bisulfite' and skips that log if found because it was doing the same thing with logs from Bismark. I'll try to find something specific to tophat logs and skip if that string is found.
ps. Glad you like the software! :)
Which log files would you like me to send you? If this is possible, an easy fix might be to ignore the bowtie logs if they are in the same folder as tophat.log
. Usually the tophat folder contains:
accepted_hits.bam accepted_hits_refChr.bam align_summary.txt insertions.bed logs unmapped.bam
accepted_hits.bam.bai accepted_hits_refChr.bam.bai deletions.bed junctions.bed prep_reads.info
So the final tophat log (align_summary.txt
) is in a separate folder from the logs
folder which contains all of the intermediate logs.
My logs
folder within tophat contains:
bam_merge_um.log bowtie.left_kept_reads.m2g_um.log juncs_db.log prep_reads.log reports.samtools_sort.log1 segment_juncs.log
bowtie_build.log bowtie.left_kept_reads.m2g_um_seg1.log long_spanning_reads.segs.log reports.log reports.samtools_sort.log2 tophat.log
bowtie_inspect_recons.log bowtie.left_kept_reads.m2g_um_seg2.log m2g_left_kept_reads.err reports.merge_bam.log reports.samtools_sort.log3
bowtie.left_kept_reads.log gtf_juncs.log m2g_left_kept_reads.out reports.samtools_sort.log0 run.log
Ah, that would explain it - I must have cleaned up my testing data to only contain the tophat log, hence the problem went away. I assumed that both sets of messages were wrapped up in the same log file.
It would be useful to find which file(s) contain the string # reads processed:
if that's ok. Then maybe see the full contents of those files to see if they have anything else we can use. Otherwise, as you say, we can look at the context of the file rather than it's contents. I have the file path already in hand, so I'll probably just opt for checking if it ends in logs/bowtie_xxx.log
and ignore it if so (easier & faster than looking around at the other files in the same folder, though this is obviously possible if required).
Phil
[jfreimer@h2 logs]$ grep 'processed:' * bowtie.left_kept_reads.log:# reads processed: 40245956 bowtie.left_kept_reads.m2g_um.log:# reads processed: 8785725 bowtie.left_kept_reads.m2g_um_seg1.log:# reads processed: 2176857 bowtie.left_kept_reads.m2g_um_seg2.log:# reads processed: 915985
The logs all look like this:
reads processed: 40245956 reads with at least one reported alignment: 31460231 (78.17%) reads that failed to align: 8386873 (20.84%) reads with alignments suppressed due to -m: 398852 (0.99%) Reported 128873269 alignments to 1 output stream(s)
Fantastic, four different files with four different numbers - that's sure to confuse some people :)
These filenames seem pretty specific - I think I'll just check for them and skip. Only other thing - is this single end data? Will paired end data also have bowtie.right_kept_reads.log
or anything? Apologies for not looking more myself - the pipeline I usually use removes all of these files so I don't have any lying around..
Phil
Ok, just pushed an update - let me know if that fixes it for you.
All of mine is single end data, but I believe paired end will have the right
data as well.
Ok, added. Unlikely to do any harm anyway.
Works. Thanks.
Hi, Great software. I had one feature suggestions. Right now multiqc will include all of the bowtie logs within a tophat folder run, whereas I think that most people just care about the results of the final tophat log. However, I don't want to entirely exclude the bowtie module as I use it in other parts of the project. I think it would be nice to have the option for multiqc to ignore these logs?