ge11232002 / TFBSTools

Software Package for Transcription Factor Binding Site (TFBS) Analysis
25 stars 10 forks source link

all_versions = FALSE in JASPAR omits results #33

Open nellykan opened 2 years ago

nellykan commented 2 years ago

Hi!

I am trying this tool for the first time and I encountered the following issue.

When I am accessing the PWMs of mouse (Tax ID: 10090) with the option "all_versions=TRUE", I get results for various factors, among which for example Klf4 and Sox2.

> opts <- list()
> opts[["species"]] <- 10090
> opts[["collection"]] <- "CORE"
> opts[["all_versions"]] <- TRUE
> opts[["matrixtype"]] <- "PFM"
> PFMatrixList <- getMatrixSet(JASPAR2020, opts)
> PFMatrixList
PFMatrixList of length 196
names(196): MA0004.1 MA0006.1 MA0009.1 MA0014.1 MA0027.1 MA0029.1 ... MA1627.1 MA1628.1 MA1629.1 MA1630.1 MA0122.3 MA1684.1
> 
> TFs <- unlist((lapply(PFMatrixList@listData, slot, name = "name")))
> TFs[TFs == "Sox2"]
MA0143.1 MA0143.2 MA0143.3 
  "Sox2"   "Sox2"   "Sox2" 
> TFs[TFs == "Klf4"]
MA0039.1 MA0039.2 
  "Klf4"   "Klf4" 

However, with the default option "all_versions=FALSE", I do not get any result of these (and other) factors, even though the expected behavior would be to get only the latest version.

> opts <- list()
> opts[["species"]] <- 10090
> opts[["collection"]] <- "CORE"
> opts[["all_versions"]] <- FALSE
> opts[["matrixtype"]] <- "PFM"
> PFMatrixList <- getMatrixSet(JASPAR2020, opts)
> PFMatrixList
PFMatrixList of length 107
names(107): MA0004.1 MA0006.1 MA0029.1 MA0067.1 MA0078.1 MA0087.1 ... MA1627.1 MA1628.1 MA1629.1 MA1630.1 MA0122.3 MA1684.1
> 
> TFs <- unlist((lapply(PFMatrixList@listData, slot, name = "name")))
> TFs[TFs == "Sox2"]
named character(0)
> TFs[TFs == "Klf4"]
named character(0)

I would appreciate any help or tips. Thanks a lot!

Nelly

Session Info: R version 4.1.1 (2021-08-10) TFBSTools_1.32.0
JASPAR2020_0.99.10

ge11232002 commented 2 years ago

Hi Nelly,

The species of certain morif got changes between versions, as well as the name. Please check the version information for Sox2 https://jaspar.genereg.net/matrix/MA0143.3/ I would omit the species option to fetch all.

library(JASPAR2020)
opts <- list()
opts[["species"]] <- NULL
opts[["collection"]] <- "CORE"
opts[["all_versions"]] <- FALSE
opts[["matrixtype"]] <- "PFM"
PFMatrixList <- getMatrixSet(JASPAR2020, opts)
PFMatrixList
which(name(PFMatrixList) == "Klf4")
which(name(PFMatrixList) == "KLF4")
which(name(PFMatrixList) == "Sox2")
which(name(PFMatrixList) == "SOX2")
nellykan commented 2 years ago

Hi Ge Tan,

I am not sure how that solves the problem. I do not want the motif for any species, but for Mus musculus specifically, and only the latest mus musculus version. In other words, I think the filtering for the version should happen after selecting a species.

ge11232002 commented 2 years ago

If you only want Mus musculus version, then you will have to pick from "all_versions" as you did initially, as the latest version can be from human.