Closed methylnick closed 6 years ago
With the first error
Exception: Epb4 gene Id is already in the dictionary, duplicated gene name
I'd grep
for Epb4
in your annotation file. It appears to be a duplicated gene name, which isn't uncommon
The second, fastqc error. I think this is to do with this options -extn _R1_001.fastq.gz
-extn
meant only for file extensions, you are giving it too much. it should be -extn fastq.gz
which is default anyway.
Related to your previous issue, if your fastq files have different extension to that, i.e .fastq
or .fq.gz
or .txt.gz
then you can use -extn
to specify extension type. I'm not too sure why you had _R1
there
Let me know how it goes
Cheers
Thanks Kirill, for the duplicated gene, it is a first time I came across this using 1.5.0 for the first time, didn't throw this error in 1.4.7 or 1.4.8 for the files I used (iGenomes).
As for the suffix issue. the _R1 is still there, even with single end reads. it's an illumina thing. Will keep playing with it and seeing.
Thanks for responding!
The suffix doesn't matter, just don't include it in -extn
that's all.
Let me know
Cheers
@methylnick closing this issue now. I think it's been resolved? Re-open it if you need more clarifications
Cheers
Have a look a the RNASik output:
http://bioinformatics.erc.monash.edu/home/nick-wong/projects/evelyn.tsantikos/RNAsik.bds.20180501_215237_833.report.html
Seems the reverse strand feature count stage errored our because of duplicate gene names in the gene table. Using iGenomes mm10 UCSC gene annotations.
Second is the prefix extraction, erroring out for fastqc generation, the fastqcs were generated, with the original fastq file names rather than the "truncated" file name when you set -extn flag with more than just
fastq.gz
, in this case_R1_001.fastq.gz
, single end reads experiment here.