Open rojinsafavi opened 2 years ago
You've probably solved this by now but for anyone searching for this, this appears this could be at least partially due to an order of operations bug. I haven't had the chance to verify this, but looking at the code, the default behavior when all_versions
in opts
is either not provided or FALSE
is to remove all TFs whose version does not equal the latest version available in the JASPAR database. However, some TFs may have a latest version that doesn't match the "opts" supplied to the getMatrixSet
which leads to them being removed. E.g. at the time of writing in JASPAR2024, for SPI1/PU.1 the latest version of the motif is MA0080.7 for Mus musculus, but if species = "Homo sapiens
, then SPI1/PU.1 won't appear because none of the Homo sapiens motifs are the latest version.
I'm not sure if this project is still maintained, so I'm not sure if I should just put in a PR or fork the repo. I think the simplest solution is to propagate opts
to the latest version filter.
In the meantime this is a rough way to do this in base R.
motifs <- TFBSTools::getMatrixSet(
RSQLite::dbConnect(RSQLite::SQLite(), "/path/to/JASPAR2024.sqlite"),
list(
species = "Homo sapiens",
collection = "CORE",
all_versions = TRUE
)
)
latest_motifs <- Reduce(
function(a, b) {
# Split into base_id and version
id <- strsplit(b, "\\.")[[1]]
# Make version numeric
id[2] <- as.numeric(id[2])
if (! id[1] %in% names(a)) {
a[id[1]] = id[2]
} else {
if (id[2] > a[id[1]]) {
a[id[1]] = id[2]
}
}
return(a)
},
names(motifs),
init = list()
)
latest_motifs <- paste(names(latest_motifs), latest_motifs, sep = ".")
motifs_filtered <- subset(motifs, names(motifs) %in% latest_motifs)
This gives the identical result to the 755 motifs (up from 720 without all_versions = T
in opts
) obtained using the following in sqlite3
:
SELECT COUNT(DISTINCT M.BASE_ID) FROM MATRIX M
JOIN MATRIX_SPECIES MS ON M.ID = MS.ID
WHERE M.COLLECTION = 'CORE' AND MS.TAX_ID = 9606;
Hello,
I am using a package called ArchR to do ATACseq analysis. ArchR uses TFBSTools to extract motifs. I noticed there is only 633 TF extracted from Jaspar, but I think the number is much higher (~1200). I was wondering if there is a reason for such a low number? I talked to ArchR developers, and they said they don't do any sort of processing internally, and that I should probably talk to you regarding this matter.
Thank you