Open jgoldst7 opened 3 years ago
Hi Jeremy,
I have run a test case with 2.7.7a and your parameters, and could not reproduce this problem. A couple of things to try:
Cheers Alex
Hi Alex,
I reran with the -outTmpKeep All option on an x1e.xlarge EC2 instance with 300 Gb of disk and it worked!
It used 130Gb of the disk space and 33% of the memory so I probably did not need to go as big, but the problem before must have been not enough disk space and the job was still completing "successfully" even though it was not actually successful.
The output bam files from the failed run also had "EOF marker is absent" errors, another sign that it must have run out of space before it actually finished. Below is the log file of the successful run to see side by side. Interestingly, the successful run (with the exact same input files) has more multi-mapped reads and chimeric reads, and less uniquely mapped reads.
Started job on | Mar 31 01:34:47
Started mapping on | Mar 31 04:30:22
Finished on | Mar 31 11:14:33
Mapping speed, Million of reads per hour | 23.99
Number of input reads | 161591557
Average input read length | 279
UNIQUE READS:
Uniquely mapped reads number | 146975608
Uniquely mapped reads % | 90.96%
Average mapped length | 279.04
Number of splices: Total | 164914867
Number of splices: Annotated (sjdb) | 164643795
Number of splices: GT/AG | 163069848
Number of splices: GC/AG | 1376287
Number of splices: AT/AC | 145863
Number of splices: Non-canonical | 322869
Mismatch rate per base, % | 0.24%
Deletion rate per base | 0.01%
Deletion average length | 1.92
Insertion rate per base | 0.01%
Insertion average length | 1.64
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 11543699
% of reads mapped to multiple loci | 7.14%
Number of reads mapped to too many loci | 101016
% of reads mapped to too many loci | 0.06%
UNMAPPED READS:
Number of reads unmapped: too many mismatches | 0
% of reads unmapped: too many mismatches | 0.00%
Number of reads unmapped: too short | 2810593
% of reads unmapped: too short | 1.74%
Number of reads unmapped: other | 160641
% of reads unmapped: other | 0.10%
CHIMERIC READS:
Number of chimeric reads | 1242113
% of chimeric reads | 0.77%
Thanks for your help. Best, Jeremy
Hi Jeremy,
the problem is STAR is not checking success of writing for some of the temporary files, so if disk is full, it will silently continue and produce wrong results. I will have to fix this.
Thanks! Alex
Hi Alex,
Thanks for the information, would definitely be a good update!
Best, Jeremy
Hello, I am running into an issue trying to produce the Chimeric.out.Junction file for large files. I have a pipeline using STAR 2.7.7a with the command
This has worked in the past, with the Chimeric.out.Junction file having valid chimeric reads that were used with STAR Fusion 1.9.0 to find fusions. However, when I try to run the exact commands with very large fastq files (~10Gb for each of gzipped 2 paired end files instead of ~2 GB like I usually use, I am having issues. STAR completes fine. This is the log file
There is a Chimeric.out.Junction file produced. However, what it actually contains is this instead of chimeric reads.
Any ideas why this might be happening, possibly related to the fastq files being large? When I took a subsample of these files of just 100,000 reads (using seqtk) everything worked fine, and the Chimeric.out.Junction file was normal.