Closed dejonggr closed 5 years ago
Hi Grant,
100k contigs should not be that slow, though it’s close to the boundary. What parameters have you used to generate the genome? Could you send me the Log.out file from the genome generation.
Also, please try to map a very small number of reads, say 10k, with --readMapNumber 10000 and send me the Log.final.out file. The problem may be with mappability.
Cheers Alex
Cheers, Alex
I actually was running STAR on a reduced file that had only 25000 reads per fastq pair.
I would send the full Log.out file but it's 165M.
I received a number of warnings RE: gene_id but I removed most of them to keep the file size small. I'm not sure why this this is happening given the fact that I included the following commands:
--sjdbGTFfeatureExon exon --sjdbGTFtagExonParentTranscript Parent --sjdbGTFtagExonParentGene Parent
Hi Grant,
the command line for genome generation has = sign which is not allowed: --genomeChrBinNbits = 16 which actually sets this parameter to 0, which might have caused problems in the mapping step.
Also, I would recommend converting the GFF3 to GTF before genome generation.
Cheers Alex
--genomeChrBinNbits = 16
This was exactly the problem. Not sure how I missed that! Everthing seems to be running fine now. I'll close the issue when the job is complete. Thanks for your help!
Great, thanks for letting me know you resolved it! Cheers Alex
I'm using a model organism pan-genome which results in ~100000 smaller contigs and it seems like STAR stall shortly after running (the output file only reach ~7.5Mb):
The Log.progress.out:
Here the tail of Log.out:
Is this a memory issue due to the large number of contigs? If so, is there a more nuanced solution apart from concatenating the contigs into a pseudo-chromosome?
Cheers, Grant