Closed alexg9010 closed 5 years ago
> h3k9me3_overlaps <- queryGff(queryRegions = h3k9me3_peaks, gffData = annotations$gtf)
Warnmeldung:
In .Seqinfo.mergexy(x, y) :
The 2 combined objects have no sequence levels in common. (Use
suppressWarnings() to suppress this warning.)
> seqinfo(annotations$gtf)
Seqinfo object with 7 sequences from an unspecified genome; no seqlengths:
seqnames seqlengths isCircular genome
I NA NA <NA>
II NA NA <NA>
III NA NA <NA>
IV NA NA <NA>
V NA NA <NA>
X NA NA <NA>
MtDNA NA NA <NA>
> seqinfo(h3k27me3_peaks)
Seqinfo object with 7 sequences from an unspecified genome; no seqlengths:
seqnames seqlengths isCircular genome
chrI NA NA <NA>
chrII NA NA <NA>
chrIII NA NA <NA>
chrIV NA NA <NA>
chrV NA NA <NA>
chrX NA NA <NA>
chrM NA NA <NA>
worked after setting same seqlevels
> seqlevels(annotations$gtf) <- seqlevels(h3k27me3_peaks)
> h3k27me3_overlaps <- queryGff(queryRegions = h3k27me3_peaks, gffData = annotations$gtf)
> h3k27me3_overlaps
GRanges object with 56566 ranges and 13 metadata columns:
Check #14 - would be great if you could do it!!
From the BS pipeline, I have the following rule:
rule tabulate_seqlengths:
input:
rules.bismark_genome_preparation.output
output:
seqlengths = DIR_mapped+"Refgen_"+ASSEMBLY+"_chromlengths.csv",
params:
chromlines = " | grep Sequence ",
chromcols = " | cut -f2,3 ",
seqnames = " | sed \"s/_CT_converted//g\" "
message: fmt("Tabulating chromosome lengths in genome: {ASSEMBLY} for later reference.")
shell:
nice('bowtie2-inspect', ['-s ' + GENOMEPATH + "Bisulfite_Genome/CT_conversion/BS_CT", '{params.chromlines}', '{params.chromcols}', '{params.seqnames}', ' > {output}'])
It creates a 2-column text file with seqlevels and chromosome lengths. Once complication here is that this uses the CT
converted version of the genome (but that's only a trick to make sure that the conversion is performed, there's no reason we can't use the original genome instead)
e.g. for the mm10 genome the output looks like this:
$ more Refgen_mm10_canon_chromlengths.csv
chr1 195471971
chr2 182113224
chr3 160039680
chr4 156508116
...
chrX 171031299
chrY 91744698
and then I pass this file as chrom_seqlengths
into the Rmd rule like this:
seqdat_temp = read.csv(paste0(out_dir,"/",chrom_seqlengths),
sep="\t", header=FALSE, stringsAsFactors=FALSE)
chr.len = seqdat_temp[,2]
names(chr.len) = seqdat_temp[,1]
myseqinfo = Seqinfo(names(chr.len), seqlengths=chr.len, genome=assembly)
myseqinfo.st = keepStandardChromosomes(myseqinfo)
I'm struggling a bit to figure out exactly where in the code this should go, but the basic idea should work If we just add the above file as an input to whatever is using the seqlengths. Does that work?
should be solved with c5225023258e99776ef34b60665df8c633e447a0
https://github.com/BIMSBbioinfo/pigx_chipseq/blob/407ac9b8ef4e69e6c7483eea57f2bfb6103c7a4e/scripts/Functions_Helper.R#L123