Hello, I recently encountered an issue where running cufflinks on a set of sam files that were aligned with hisat2 (command below). cufflinks would run on the files when run without a gtf file as reference, but failed with the following error when I provided the reference.
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
File SM_01_WT_-N_night_rA_S1.wheader.docsorted.sam doesn't appear to be a valid BAM file, trying SAM...
[14:26:57] Loading reference annotation.
[14:26:57] Inspecting reads and determining fragment length distribution.
> Processing Locus ctg010490:2-223890:221501-2 [************ ] 49%
Error: this SAM file doesn't appear to be correctly sorted!
current hit is at ctg000040:2-211194:1601, last one was at ctg000000:2-55635:50523
Cufflinks requires that if your file has SQ records in
the SAM header that they appear in the same order as the chromosomes names
in the alignments.
If there are no SQ records in the header, or if the header is missing,
the alignments must be sorted lexicographically by chromsome
name and by position.
I am working with a recently assembled draft genome, and so the names of the contigs in the assembly happened to have some less typical characters in them, including a colon. I eventually fixed this issue by removing the colon in all the contig names in the reference genome and the .gtf annotation file, and re-aligning with hisat2.
Any ideas why this failed only when the .gtf reference was provided?
Hello, I recently encountered an issue where running cufflinks on a set of sam files that were aligned with hisat2 (command below). cufflinks would run on the files when run without a gtf file as reference, but failed with the following error when I provided the reference.
hisat2 command
hisat2 -p 20 -S sample.sam -1 sample_R1.fastq.gz -2 sample_R2.fastq.gz --summary-file samplesummary.txt -x genomeindex --rna-strandness RF --dta-cufflinks --no-unal
with reference command
cufflinks -p 16 --library-type fr-firststrand --max-mle-iterations 10000 -g my.gtf -o ./mysample mysample.cuffdocssorted.sam
without reference command
cufflinks -p 16 --library-type fr-firststrand --max-mle-iterations 10000 -o ./mysample mysample.cuffdocssorted.sam
error encountered when running with reference
I am working with a recently assembled draft genome, and so the names of the contigs in the assembly happened to have some less typical characters in them, including a colon. I eventually fixed this issue by removing the colon in all the contig names in the reference genome and the .gtf annotation file, and re-aligning with hisat2.
Any ideas why this failed only when the .gtf reference was provided?