GreenleafLab / ArchR

ArchR : Analysis of Regulatory Chromatin in R (www.ArchRProject.com)
MIT License
388 stars 140 forks source link

Manual seqlevels not working #1616

Closed Goultard59 closed 2 years ago

Goultard59 commented 2 years ago

Hello I notice that the two filter options in the createGenomeAnnotation are correlated

filterChr doesn't work if filter is not set to true

https://github.com/GreenleafLab/ArchR/blob/79953a93ae376a500e681aa3b5ba669af6bb58e1/R/AnnotationGenome.R#L37-L40

It's not interesting in my studies as I want to keep my contigs (AEMK02000130.1''AEMK02000131.1''AEMK02000133.1''AEMK02000137.1') but I want to remove other chromosomes (MT) because this one isn't contained in my cellranger output.

Found Gene Seqnames not in GenomeAnnotation chromSizes, Removing : MT

Found Exon Seqnames not in GenomeAnnotation chromSizes, Removing : MT

Found TSS Seqnames not in GenomeAnnotation chromSizes, Removing : MT

Interestingly this will throw an error much later in the processing

2022-09-09 16:45:03 : ERROR Found in .fastFeatureCounts for (J9-2 : 1 of 2) 
LogFile = ArchRLogs/ArchR-createArrows-d3b3301d0164-Date-2022-09-09_Time-14-34-21.log

<simpleError in .getFragsFromArrow(ArrowFile = ArrowFile, chr = names(featureList)[x],     out = "IRanges", cellNames = cellNames): Chromosome NA not in ArrowFile! Available Chromosomes are : 1,10,11,12,13,14,15,16,17,18,2,3,4,5,6,7,8,9,AEMK02000125.1,AEMK02000128.1,AEMK02000130.1,AEMK02000131.1,AEMK02000133.1

This error didn't happen when I filter the chromosome as default as the chromosome order has non standard chromosome at the end, this error seems to only appear when the chromosome which isn't in the arrow files is in the middle of the featureList https://github.com/GreenleafLab/ArchR/blob/79953a93ae376a500e681aa3b5ba669af6bb58e1/R/CreateArrow.R#L1046

rcorces commented 2 years ago

Hi @Goultard59! Thanks for using ArchR! Please make sure that your post belongs in the Issues section. Only bugs and error reports belong in the Issues section. Usage questions and feature requests should be posted in the Discussions section, not in Issues.
Before we help you, you must respond to the following questions unless your original post already contained this information: 1. If you've encountered an error, have you already searched previous Issues to make sure that this hasn't already been solved? 2. Can you recapitulate your error using the tutorial code and dataset? If so, provide a reproducible example. 3. Did you post your log file? If not, add it now. 4. Remove any screenshots that contain text and instead copy and paste the text using markdown's codeblock syntax (three consecutive backticks). You can do this by editing your original post.

rcorces commented 2 years ago

I think I've fixed the problem you are mentioning via https://github.com/GreenleafLab/ArchR/commit/7221e2100176e053f55c89bfc2b1444404a8ed29 on the dev_filterChr branch.

I've added a new parameter standard which can be set to FALSE, allowing non-standard chr to be kept while removing the ones indicated by filterChr.

Can you confirm? To install:

devtools::install_github("GreenleafLab/ArchR", ref="dev_filterChr", repos = BiocManager::repositories())
#to unload a package and reload
detach("package:ArchR", unload=TRUE)
library(ArchR)
Goultard59 commented 2 years ago

Unfortunately, i still get an error

2022-09-13 12:17:28 : ERROR Found in .fastFeatureCounts for (J9-1 : 2 of 2) 
LogFile = ArchRLogs/ArchR-createArrows-e4de6f7de87a-Date-2022-09-13_Time-09-48-07.log

<simpleError in .getFragsFromArrow(ArrowFile = ArrowFile, chr = names(featureList)[x],     out = "IRanges", cellNames = cellNames): Chromosome NA not in ArrowFile! Available Chromosomes are : 1,10,11,12,13,14,15,16,17,18,2,3,4,5,6,7,8,9,AEMK02000125.1,AEMK02000128.1,AEMK02000130.1,AEMK02000131.1,AEMK02000133.1,AEMK02000137.1,AEMK02000139.1,AEMK02000140.1,AEMK02000147.1,AEMK02000150.1,AEMK02000153.1,AEMK02000155.1,AEMK02000156.1,AEMK02000159.1,AEMK02000161.1,AEMK02000162.1,AEMK02000164.1,AEMK02000170.1,AEMK02000171.1,AEMK02000172.1,AEMK02000173.1,AEMK02000174.1,AEMK02000175.1,AEMK02000182.1,AEMK02000183.1,AEMK02000188.1,AEMK02000189.1,AEMK02000191.1,AEMK02000194.1,AEMK02000197.1,AEMK02000199.1,AEMK02000200.1,AEMK02000201.1,AEMK02000203.1,AEMK02000207.1,AEMK02000208.1,AEMK02000213.1,AEMK02000214.1,AEMK02000216.1,AEMK02000222.1,AEMK02000223.1,AEMK02000224.1,AEMK02000229.1,AEMK02000234.1,AEMK02000237.1,AEMK02000238.1,AEMK02000240.1,AEMK02000244.1,AEMK02000245.1,AEMK02000247.1,AEMK02000252.1,AEMK02000253.1,AEMK02000254.1,AEMK02000258.1,AEMK02000259.1,AEMK02000260.1,AEMK02000261.1,AEMK02000263.1,AEMK02000264.1,AEMK02000265.1,AEMK02000273.1,AEMK02000276.1,AEMK02000277.1,AEMK02000278.1,AEMK02000279.1,AEMK02000280.1,AEMK02000291.1,AEMK02000295.1,AEMK02000297.1,AEMK02000299.1,AEMK02000310.1,AEMK02000311.1,AEMK02000312.1,AEMK02000313.1,AEMK02000316.1,AEMK02000319.1,AEMK02000320.1,AEMK02000324.1,AEMK02000326.1,AEMK02000328.1,AEMK02000329.1,AEMK02000333.1,AEMK02000336.1,AEMK02000340.1,AEMK02000341.1,AEMK02000346.1,AEMK02000348.1,AEMK02000354.1,AEMK02000355.1,AEMK02000361.1,AEMK02000363.1,AEMK02000364.1,AEMK02000368.1,AEMK02000369.1,AEMK02000370.1,AEMK02000374.1,AEMK02000378.1,AEMK02000379.1,AEMK02000380.1,AEMK02000388.1,AEMK02000389.1,AEMK02000390.1,AEMK02000391.1,AEMK02000393.1,AEMK02000398.1,AEMK02000399.1,AEMK02000400.1,AEMK02000402.1,AEMK02000403.1,AEMK02000408.1,AEMK02000410.1,AEMK02000412.1,AEMK02000414.1,AEMK02000415.1,AEMK02000418.1,AEMK02000421.1,AEMK02000422.1,AEMK02000423.1,AEMK02000427.1,AEMK02000429.1,AEMK02000434.1,AEMK02000435.1,AEMK02000437.1,AEMK02000439.1,AEMK02000440.1,AEMK02000442.1,AEMK02000446.1,AEMK02000449.1,AEMK02000451.1,AEMK02000452.1,AEMK02000453.1,AEMK02000457.1,AEMK02000460.1,AEMK02000465.1,AEMK02000467.1,AEMK02000468.1,AEMK02000472.1,AEMK02000473.1,AEMK02000474.1,AEMK02000476.1,AEMK02000478.1,AEMK02000479.1,AEMK02000484.1,AEMK02000485.1,AEMK02000489.1,AEMK02000490.1,AEMK02000491.1,AEMK02000493.1,AEMK02000495.1,AEMK02000496.1,AEMK02000497.1,AEMK02000501.1,AEMK02000503.1,AEMK02000506.1,AEMK02000507.1,AEMK02000509.1,AEMK02000510.1,AEMK02000511.1,AEMK02000512.1,AEMK02000514.1,AEMK02000515.1,AEMK02000517.1,AEMK02000518.1,AEMK02000520.1,AEMK02000522.1,AEMK02000525.1,AEMK02000526.1,AEMK02000527.1,AEMK02000528.1,AEMK02000529.1,AEMK02000531.1,AEMK02000532.1,AEMK02000534.1,AEMK02000537.1,AEMK02000541.1,AEMK02000546.1,AEMK02000551.1,AEMK02000554.1,AEMK02000555.1,AEMK02000556.1,AEMK02000558.1,AEMK02000559.1,AEMK02000560.1,AEMK02000561.1,AEMK02000563.1,AEMK02000566.1,AEMK02000567.1,AEMK02000569.1,AEMK02000570.1,AEMK02000574.1,AEMK02000575.1,AEMK02000576.1,AEMK02000577.1,AEMK02000579.1,AEMK02000584.1,AEMK02000588.1,AEMK02000589.1,AEMK02000591.1,AEMK02000592.1,AEMK02000593.1,AEMK02000596.1,AEMK02000597.1,AEMK02000598.1,AEMK02000599.1,AEMK02000600.1,AEMK02000601.1,AEMK02000602.1,AEMK02000606.1,AEMK02000610.1,AEMK02000611.1,AEMK02000612.1,AEMK02000616.1,AEMK02000617.1,AEMK02000618.1,AEMK02000620.1,AEMK02000621.1,AEMK02000622.1,AEMK02000623.1,AEMK02000624.1,AEMK02000628.1,AEMK02000629.1,AEMK02000630.1,AEMK02000631.1,AEMK02000632.1,AEMK02000634.1,AEMK02000635.1,AEMK02000637.1,AEMK02000640.1,AEMK02000641.1,AEMK02000643.1,AEMK02000649.1,AEMK02000652.1,AEMK02000658.1,AEMK02000659.1,AEMK02000661.1,AEMK02000664.1,AEMK02000667.1,AEMK02000668.1,AEMK02000672.1,AEMK02000673.1,AEMK02000676.1,AEMK02000677.1,AEMK02000679.1,AEMK02000680.1,AEMK02000681.1,AEMK02000682.1,AEMK02000683.1,AEMK02000687.1,AEMK02000690.1,AEMK02000692.1,AEMK02000694.1,AEMK02000695.1,AEMK02000696.1,AEMK02000697.1,AEMK02000698.1,AEMK02000699.1,AEMK02000702.1,AEMK02000703.1,AEMK02000704.1,X,Y>

2022-09-13 12:17:28 : errorList, Class = list

errorList$x: length = 1
[1] 21

errorList$chr: length = 1
[1] NA

errorList$fragments: length = 1
[1] "Error with Fragments!"

errorList$features: length = 1
[1] "Error with Features!"

I didn't understand from were the error comewhen I execute :

featureList <- list()
  featureList$Promoter <-  extendGR(
      gr = GenomicRanges::resize(geneAnnotation$genes, 1, "start"), 
      upstream = 2000, 
      downstream = 100
  )

  if(!is.null(genomeAnnotation$blacklist)){
    if(length(genomeAnnotation$blacklist) > 0){
      featureList$Blacklist <- genomeAnnotation$blacklist
    }
  }

  for(x in seq_along(featureList)){
    featurex <- featureList[[x]]
    mcols(featurex) <- NULL
    mcols(featurex)$FeatureName <- names(featureList)[x]
    if(x == 1){
      feature <- featurex
    }else{
      feature <- c(feature, featurex)
    }

  }

featureList <- split(feature, seqnames(feature))

I get an object that seems good where all seqnames are included in genomeannotation as unique(seqnames(geneAnnotation$genes)) %in% unique(seqnames(genomeAnnotation$chromSizes)) is TRUE unique(seqnames(genomeAnnotation$chromSizes)) %in% unique(seqnames(geneAnnotation$genes)) is partially TRUE

rcorces commented 2 years ago

Sorry - I'm not sure I can provide additional help without a reproducible example.

Goultard59 commented 2 years ago

I have proceeded with my data through cellranger with a custom pigs annotation.

I have built a custom BSgenome annotation

my_file <- read.dcf("/home/adufour/work/genome_and_annotation/orf/src_seqdir/seed_file.txt", fields = NULL, all = FALSE, keep.white = NULL) 

write.dcf(my_file , file = "/home/adufour/work/genome_and_annotation/orf/src_seqdir/seed.dcf", append = FALSE, useBytes = FALSE, indent = 0.1 * getOption("width"), width = 0.9 * getOption("width"), keep.white = NULL)

library(BSgenome)

unlink(c("SuscrofaTxdb.11.102.fixed"), recursive = TRUE, force = TRUE)

forgeBSgenomeDataPkg("/home/adufour/work/genome_and_annotation/orf/src_seqdir/seed.dcf")

I also built a geneannotation file like this :

library(org.Ss.eg.db)
library(SuscrofaTxdb.11.102.fixed)
library(GenomicRanges)
library(gprofiler2)
library(GenomicFeatures)

orgdb <- org.Ss.eg.db

txdb_double <- makeTxDbFromGFF("/home/adufour/work/genome/Sus_scrofa.11.1.102.quant3p_extended.updated.sorted.gtf", format="gtf")

filter <- list(tx_chrom = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "X", "Y", "AEMK02000452.1", "AEMK02000698.1", "AEMK02000361.1", "AEMK02000598.1", "AEMK02000682.1", "AEMK02000449.1", "AEMK02000677.1", "AEMK02000408.1", "AEMK02000569.1", "AEMK02000393.1", "AEMK02000510.1", "AEMK02000328.1", "AEMK02000465.1", "AEMK02000133.1", "AEMK02000661.1", "AEMK02000261.1", "AEMK02000324.1", "AEMK02000258.1", "AEMK02000537.1", "AEMK02000522.1", "AEMK02000694.1", "AEMK02000451.1", "AEMK02000478.1", "AEMK02000496.1", "AEMK02000390.1", "AEMK02000175.1", "AEMK02000602.1", "AEMK02000370.1", "AEMK02000213.1", "AEMK02000630.1", "AEMK02000264.1", "AEMK02000600.1", "AEMK02000410.1", "AEMK02000525.1", "AEMK02000159.1", "AEMK02000423.1", "AEMK02000566.1", "AEMK02000278.1", "AEMK02000577.1", "AEMK02000697.1", "AEMK02000532.1", "AEMK02000189.1", "AEMK02000400.1", "AEMK02000253.1", "AEMK02000649.1", "AEMK02000234.1", "AEMK02000368.1", "AEMK02000640.1", "AEMK02000703.1", "AEMK02000341.1", "AEMK02000495.1", "AEMK02000497.1", "AEMK02000310.1", "AEMK02000297.1", "AEMK02000171.1", "AEMK02000517.1", "AEMK02000591.1", "AEMK02000485.1", "AEMK02000658.1", "AEMK02000435.1", "AEMK02000589.1", "AEMK02000612.1", "AEMK02000493.1", "AEMK02000579.1", "AEMK02000476.1", "AEMK02000434.1", "AEMK02000156.1", "AEMK02000526.1", "AEMK02000468.1", "AEMK02000336.1", "AEMK02000240.1", "AEMK02000320.1", "AEMK02000162.1", "AEMK02000333.1", "AEMK02000680.1", "AEMK02000254.1", "AEMK02000173.1", "AEMK02000453.1", "AEMK02000617.1", "AEMK02000576.1", "AEMK02000620.1", "AEMK02000378.1", "AEMK02000276.1", "AEMK02000137.1", "AEMK02000514.1", "AEMK02000679.1", "AEMK02000277.1", "AEMK02000238.1", "AEMK02000588.1", "AEMK02000399.1", "AEMK02000200.1", "AEMK02000182.1", "AEMK02000558.1", "AEMK02000692.1", "AEMK02000295.1", "AEMK02000584.1", "AEMK02000131.1", "AEMK02000364.1", "AEMK02000696.1", "AEMK02000534.1", "AEMK02000299.1", "AEMK02000621.1", "AEMK02000440.1", "AEMK02000528.1", "AEMK02000623.1", "AEMK02000369.1", "AEMK02000245.1", "AEMK02000554.1", "AEMK02000541.1", "AEMK02000593.1", "AEMK02000574.1", "AEMK02000632.1", "AEMK02000503.1", "AEMK02000637.1", "AEMK02000223.1", "AEMK02000460.1", "AEMK02000672.1", "AEMK02000472.1", "AEMK02000208.1", "AEMK02000346.1", "AEMK02000164.1", "AEMK02000153.1", "AEMK02000170.1", "AEMK02000191.1", "AEMK02000437.1", "AEMK02000291.1", "AEMK02000563.1", "AEMK02000507.1", "AEMK02000415.1", "AEMK02000515.1", "AEMK02000422.1", "AEMK02000474.1", "AEMK02000683.1", "AEMK02000512.1", "AEMK02000389.1", "AEMK02000559.1", "AEMK02000427.1", "AEMK02000616.1", "AEMK02000699.1", "AEMK02000704.1", "AEMK02000188.1", "AEMK02000506.1", "AEMK02000529.1", "AEMK02000414.1", "AEMK02000172.1", "AEMK02000207.1", "AEMK02000421.1", "AEMK02000391.1", "AEMK02000628.1", "AEMK02000668.1", "AEMK02000380.1", "AEMK02000259.1", "AEMK02000203.1", "AEMK02000501.1", "AEMK02000641.1", "AEMK02000398.1", "AEMK02000263.1", "AEMK02000260.1", "AEMK02000273.1", "AEMK02000311.1", "AEMK02000412.1", "AEMK02000229.1", "AEMK02000676.1", "AEMK02000247.1", "AEMK02000561.1", "AEMK02000659.1", "AEMK02000280.1", "AEMK02000592.1", "AEMK02000631.1", "AEMK02000363.1", "AEMK02000509.1", "AEMK02000439.1", "AEMK02000403.1", "AEMK02000183.1", "AEMK02000130.1", "AEMK02000673.1", "AEMK02000611.1", "AEMK02000174.1", "AEMK02000606.1", "AEMK02000340.1", "AEMK02000518.1", "AEMK02000489.1", "AEMK02000313.1", "AEMK02000125.1", "AEMK02000702.1", "AEMK02000197.1", "AEMK02000457.1", "AEMK02000147.1", "AEMK02000575.1", "AEMK02000511.1", "AEMK02000161.1", "AEMK02000429.1", "AEMK02000596.1", "AEMK02000622.1", "AEMK02000279.1", "AEMK02000555.1", "AEMK02000546.1", "AEMK02000687.1", "AEMK02000479.1", "AEMK02000556.1", "AEMK02000402.1", "AEMK02000681.1", "AEMK02000520.1", "AEMK02000140.1", "AEMK02000244.1", "AEMK02000374.1", "AEMK02000222.1", "AEMK02000139.1", "AEMK02000624.1", "AEMK02000442.1", "AEMK02000252.1", "AEMK02000150.1", "AEMK02000467.1", "AEMK02000570.1", "AEMK02000664.1", "AEMK02000201.1", "AEMK02000618.1", "AEMK02000355.1", "AEMK02000597.1", "AEMK02000216.1", "AEMK02000379.1", "AEMK02000635.1", "AEMK02000316.1", "AEMK02000348.1", "AEMK02000643.1", "AEMK02000199.1", "AEMK02000490.1", "AEMK02000265.1", "AEMK02000634.1", "AEMK02000473.1", "AEMK02000237.1", "AEMK02000319.1", "AEMK02000326.1", "AEMK02000418.1", "AEMK02000610.1", "AEMK02000560.1", "AEMK02000601.1", "AEMK02000484.1", "AEMK02000329.1", "AEMK02000388.1", "AEMK02000224.1", "AEMK02000629.1", "AEMK02000599.1", "AEMK02000446.1", "AEMK02000527.1", "AEMK02000194.1", "AEMK02000567.1", "AEMK02000128.1", "AEMK02000312.1", "AEMK02000667.1", "AEMK02000354.1", "AEMK02000652.1", "AEMK02000155.1", "AEMK02000214.1", "AEMK02000690.1", "AEMK02000551.1", "AEMK02000491.1", "AEMK02000531.1", "AEMK02000695.1"))

genes <- GenomicFeatures::genes(txdb_double, filter = filter)

dataframe <- gconvert(
genes$gene_id,
organism = "sscrofa",
target = "ENTREZGENE_ACC",
numeric_ns = "ENTREZGENE_ACC",
mthreshold = 1,
filter_na = FALSE
)

genes$gene_id <- dataframe$target

genes$symbol <- dataframe$name

names(genes) <- NULL

genes <- genes[!genes$gene_id %in% NA]

seqlevels(genes) <- seqlevelsInUse(genes)

exons <- exons(txdb_double, columns="gene_id", filter = filter)

dataframe <- gconvert(
exons$gene_id,
organism = "sscrofa",
target = "ENTREZGENE_ACC",
numeric_ns = "ENTREZGENE_ACC",
mthreshold = 1,
filter_na = FALSE
)

exons$gene_id <- dataframe$target

exons$symbol <- dataframe$name

exons <- exons[!exons$gene_id %in% NA]

seqlevels(exons) <- seqlevelsInUse(exons)

transcripts_list <- transcripts(txdb_double, columns=c("tx_id", "tx_name"), filter = filter)

seqlevels(transcripts_list) <- seqlevelsInUse(transcripts_list)

tss <- resize(transcripts_list, width=1, fix='start')

geneAnnotation <- createGeneAnnotation(genes = genes, exons = exons, TSS = tss, OrgDb = orgdb)

saveRDS(geneAnnotation, file = "/home/adufour/work/rds_storage/omics/geneannotation.rds")

And here is the start of my Archr notebook


library(ArchR)
library(org.Ss.eg.db)
library(SuscrofaTxdb.11.102.fixed)

options(repr.plot.width = 18, repr.plot.height = 17, repr.plot.pointsize = 24)

geneAnnotation <- readRDS("/home/adufour/work/rds_storage/omics/geneannotation.rds")

genomeAnnotation <- createGenomeAnnotation(SuscrofaTxdb.11.102.fixed, filterChr = "MT", standard = FALSE, filter = TRUE)

work_dir <- paste0("/home/adufour/work/archr_j9_extended")
data_dir <- "/home/adufour/work/run_cellranger_count/ensembl/arc_j9_extended"
dir.create(work_dir, recursive = TRUE)
setwd(work_dir)

addArchRThreads(1)

# Create Arrow file
createArrowFiles(inputFiles  = c("/home/adufour/work/fragencode/workspace/ckurylo/plus4pigs/cellranger_arc/count/J9o_1.Sus_scrofa.11.1.102.quant3p_extended.updated.sorted.2021_12_09/J9o_1/outs/atac_fragments.tsv.gz",
                                 "/home/adufour/work/fragencode/workspace/ckurylo/plus4pigs/cellranger_arc/count/J9o_2.Sus_scrofa.11.1.102.quant3p_extended.updated.sorted.2021_12_09/J9o_2/outs/atac_fragments.tsv.gz"), 
                 sampleNames = c("J9-1","J9-2"), 
                 QCDir       = "QualityControl",
                 logFile     = createLogFile(name = "createArrows", 
                                             logDir = "ArchRLogs"),
                 minTSS = 1.5,
                 minFrags = 300,
                 maxFrags = 1e+10,
                 force = TRUE,
                 excludeChr = "MT",
                 geneAnnotation = geneAnnotation,
                 genomeAnnotation = genomeAnnotation)

Everything works fine until I try to unfilter contigs in genome annotation

rcorces commented 2 years ago

I'm not really able to help much with this because I dont have a reproducible example. In this post, you mentioned an error but didnt upload a log file - https://github.com/GreenleafLab/ArchR/issues/1616#issuecomment-1245479225

When you are aligning your data and subsequently forging your genome objects, why not just exclude the contigs that you dont want at that point? Remove them from your genome build prior to alignment and remove them from your gene and genome annotations.

Goultard59 commented 2 years ago

Hi,

I finally found the origins and solution to my problems, it seems that all genome annotation needs to be in gene annotation :

unique(seqnames(genomeAnnotation$chromSizes)) %in% unique(seqnames(geneAnnotation$genes))

So I filter my geneannotation as follows:

library(ArchR)
library(org.Ss.eg.db)
library(SuscrofaTxdb.11.102.fixed)
library(GenomicRanges)
library(gprofiler2)
library(GenomicFeatures)

orgdb <- org.Ss.eg.db

txdb_double <- makeTxDbFromGFF("/home/adufour/work/fragencode/workspace/ckurylo/plus4pigs/genome/Sus_scrofa.11.1.102.quant3p_extended.updated.sorted.gtf", format="gtf")

genes <- GenomicFeatures::genes(txdb_double)

dataframe <- gconvert(
genes$gene_id,
organism = "sscrofa",
target = "ENTREZGENE_ACC",
numeric_ns = "ENTREZGENE_ACC",
mthreshold = 1,
filter_na = FALSE
)

genes$gene_id <- dataframe$target

genes$symbol <- dataframe$name

names(genes) <- NULL

genes <- genes[!genes$gene_id %in% NA]

filter <- seqlevels(genes)[!seqlevels(genes) %in% seqlevelsInUse(genes)]

seqlevels(genes) <- seqlevelsInUse(genes)

exons <- exons(txdb_double, columns="gene_id")

dataframe <- gconvert(
exons$gene_id,
organism = "sscrofa",
target = "ENTREZGENE_ACC",
numeric_ns = "ENTREZGENE_ACC",
mthreshold = 1,
filter_na = FALSE
)

exons$gene_id <- dataframe$target

exons$symbol <- dataframe$name

exons <- exons[!exons$gene_id %in% NA]

seqlevels(exons) <- seqlevelsInUse(exons)

transcripts_list <- transcripts(txdb_double, columns=c("tx_id", "tx_name"))

transcripts_list <- transcripts_list[seqnames(transcripts_list) %in% seqlevels(transcripts_list)[seqlevels(transcripts_list) %in% seqlevels(genes)]]

seqlevels(transcripts_list) <- seqlevelsInUse(transcripts_list)

tss <- resize(transcripts_list, width=1, fix='start')

geneAnnotation <- createGeneAnnotation(genes = genes, exons = exons, TSS = tss, OrgDb = orgdb)

saveRDS(geneAnnotation, file = "/home/adufour/work/rds_storage/omics/geneannotation.rds")

And I use the seqnames as my seqlist to create my BSgenome package

I also noticed that those lines haven't worked in all of my pipeline :

https://github.com/GreenleafLab/ArchR/blob/7221e2100176e053f55c89bfc2b1444404a8ed29/R/GroupCoverages.R#L122-L139

Maybe it's because I have multiple ArrowsFiles (multiple samples) and the empty chromosomes/contigs were in the second ArrowFiles

After filtering empty contigs the pipeline goes successfully.

rcorces commented 2 years ago

Thank you for posting your solution. I will attempt to update the documentation to more accurately reflect this.