Closed csittz closed 7 months ago
Hi, this is a seqlevel style error I think. Did you use a genome that was not matching the reference ? Can you run and send me seqinfo for:
seqinfo(loadTxdb(df1)) seqinfo(FaFile(df1@fafile))
Do they Match ?
Hi Roleren
thanks. I used STAR to map to hg38.fa and output to transcriptome bam based on to annotation gencode.v44.annotation.gtf.
the fa has more chromosome.
txdb <- loadTxdb('gencode.v44.annotation.gtf')
seqinfo(txdb)
Seqinfo object with 25 sequences (1 circular) from an unspecified genome; no seqlengths:
seqnames seqlengths isCircular genome
chr1 <NA> <NA> <NA>
chr2 <NA> <NA> <NA>
chr3 <NA> <NA> <NA>
chr4 <NA> <NA> <NA>
chr5 <NA> <NA> <NA>
... ... ... ...
chr21 <NA> <NA> <NA>
chr22 <NA> <NA> <NA>
chrX <NA> <NA> <NA>
chrY <NA> <NA> <NA>
chrM <NA> TRUE <NA>
seqinfo(FaFile(df1@fafile))
Seqinfo object with 3366 sequences from an unspecified genome:
seqnames seqlengths isCircular genome
chr1 248956422 <NA> <NA>
chr2 242193529 <NA> <NA>
chr3 198295559 <NA> <NA>
chr4 190214555 <NA> <NA>
chr5 181538259 <NA> <NA>
... ... ... ...
HLA-DRB1*15:01:01:04 11056 <NA> <NA>
HLA-DRB1*15:02:01 10313 <NA> <NA>
HLA-DRB1*15:03:01:01 11567 <NA> <NA>
HLA-DRB1*15:03:01:02 11569 <NA> <NA>
HLA-DRB1*16:02:01 11005 <NA> <NA
is there any parameter to ignore chromosome not found? i tried removing the extra chromosome in the fasta file, now that the seqinfo match, but still face the same error when runing QC.
seqinfo(FaFile(df1@fafile))
Seqinfo object with 25 sequences from an unspecified genome:
seqnames seqlengths isCircular genome
chr1 248956422 <NA> <NA>
chr2 242193529 <NA> <NA>
chr3 198295559 <NA> <NA>
chr4 190214555 <NA> <NA>
chr5 181538259 <NA> <NA>
... ... ... ...
chr21 46709983 <NA> <NA>
chr22 50818468 <NA> <NA>
chrX 156040895 <NA> <NA>
chrY 57227415 <NA> <NA>
chrM 16569 <NA> <NA>
Hi, first of all, you need the original genome fasta which you aligned to.
Else downstream analysis will fail, because you will have ribo-seq reads aligning to a scaffold you removed.
Secondly, lets use the ORFik txdb fixer function:
# Make and save a txdb named, gencode.v44.annotation.gtf.db
ORFik::makeTxdbFromGenome("gencode.v44.annotation.gtf", genome = 'Ref/gencode.v44.transcripts.fa', organism = "Homo sapiens", optimize = TRUE)
# Now remake the experiment with corrected txdb
create.experiment(dir='transcript.bam/',exper='myProject',txdb='gencode.v44.annotation.gtf.db', ...
Now delete the entire /QC_STATS/ folder relative to your 'dir', which was transcript.bam/, to make sure QC is run without any cached results. Now rerun ORFikQC and it should work.
Thanks. I realized i should be using bam files mapped to genome rather than those converted to transcriptome. using the bam files mapped to genome i can run the ORFikQC without error.
Thank you. i shall close this thread.
Hi, was trying to use ORFik for my riboseq analyses, i encounter this error when running ORFikQC.
Using R 4.2.2, ORFik 1.18.2, window10
Output from R [edited]:
Running ORFikQC with df1[1,] only give Error in covRleFromGR(x, weight = weight, ignore.strand = ignore.strand): Seqlengths of x contains NA values!
Can't run the shiftFootPrint as well