Open stanaka6 opened 3 years ago
I believe that using the updated pySCENIC >=0.11.3 will fix this.
Hi @cflerin ,
I truly appreciate your answer. I may misunderstand but the latest version is 0.11.2, right? Did you mean that that issue is unable to resolve until the newer version is released? I have tried updating pyscenic but it's still 0.11.2.
Thank you for your help.
Sorry @stanaka6! 0.11.2 is indeed the current version, I must have gotten mixed up with another issue.
Here it seems that all of your modules are dropped due to low overlap with the database. I notice that your adj.tsv has lower case gene names vs upper case in the database (at least the few that I can see). So you should check to make sure you're using the same gene symbols across both, otherwise there's no way that an overlap can be found.
Hi @cflerin,
Thank you very much for your quick response.
Zebrafish gene names (symbols) are consist of both upper and lower cases. For example, ABR (ENSDARG00000059587; Chr5:62374092-62545913), abr (ENSDARG00000095180; Chr5:62611400-62642718) and MDFIC (ENSDARG00000074223; Chromosome 4: 6,339,572-6,416,414), etc. Therefore, my input data contains both upper and lower-case gene symbols. I am not sure if this causes the issues.
I manually checked overlaps and could see them (for example, TF "tbx3a" in both adj.tsv and Motif2TF db; tbx3a's target gene "dio2" in .feather cisTarget ranking database).
Also, I found there are only 71 warning messages when running pyscenic ctx
function (described in my 1 st post), but I do not understand this is relevant to my problem.
Here is the link for my input files (except for our original expression matrix loom file): https://umbc.box.com/s/0dez16fxn4nwvvhojr3y9hz43dq5442n I would really appreciate it if you have a chance to look at them if possible.
One of my concerns is whether the zebrafish Motif2TF database is good. As described above, I identified zebrafish orthologs of mouse genes by Orthofinder, and substituted mouse genes to zebrafish ortholog genes in motifs-v9nr_clust-nr.mgi-m0.001-o0.0.tbl. A mouse gene often matches several zebrafish genes -- and vise versa(see below pic; this is Orthofinder's output).
Therefore, I made a list of mouse and zebrafish genes from the above data, and separate the rows by commas. Also, I removed the genes from the Motif2TF database if there are no matched zebrafish genes. I am not sure the way I performed was correct.
The below is the partial code that I used for creating the zebrafish Motif2TF database.
## import motifs-v9nr_clust-nr.mgi-m0.001-o0.0.tbl as csv ----
tbl <- read.csv("motifs-v9nr_clust-nr.mgi-m0.001-o0.0.csv", header = TRUE)
## Make zebrafish-mouse gene substitute list ----
# Extract gene symbol from "motifs-v9nr_clust-nr.mgi-m0.001-o0.0."
mm.genes <- tbl$gene_name
# Annotate ensemble ID
library(AnnotationDbi)
library(org.Mm.eg.db)
df <- data.frame(gene = mm.genes)
df$EnsembleID <- mapIds(org.Mm.eg.db, keys = df$gene, column = "ENSEMBL", keytype = "SYMBOL")
# Leave unique values
unique.df <- df[!duplicated(df$gene), ]
colnames(unique.df) <- c("gene","mouse") # change colnames
## Load Orthofinder output (above picture) ----
orthofinder <- read.csv("Mus_musculus.GRCm39.pep.all__v__Danio_rerio.GRCz11.pep.all.csv")
colnames(orthofinder) <- c("orthogroup", "mouse", "zebrafish") # change col names
## Separate comma-separated values (mouse gene)
separate.df <- orthofinder %>%
tidyr::separate_rows(zebrafish, sep = ",") %>%
tidyr::separate_rows(mouse, sep = ",") %>%
dplyr::group_by(mouse) %>%
dplyr::summarise(zebrafish = paste0(sort(unique(na.omit(zebrafish))), collapse = ','))
# remove version numbers in mouse EnsembleID
separate.df$mouse <- sub("\\..*", "", separate.df$mouse)
# Merge with the gene list from "motifs-v9nr_clust-nr.mgi-m0.001-o0.0"
dfA <- unique.df %>%
dplyr::left_join(separate.df)
## Add zebrafish ID to Motif2TF table ---
dfA <- dfA[, -which(names(dfA) %in% "mouse")] # Remove mouse ensemble ID
colnames(dfA) <- c("gene_name", "zebrafish") # Change col names
merge <- merge(dfA, tbl, by = "gene_name", all = TRUE)
## separate comma-separated rows in zebrafish ID ----
merge2 <- tidyr::separate_rows(merge, zebrafish, sep = ",")
# remove version numbers in zebrafish EnsembleID
merge2$zebrafish <- sub("\\..*", "", merge2$zebrafish)
## Remove NA containing rows
completeFun <- function(data, desiredCols) {
completeVec <- complete.cases(data[, desiredCols])
return(data[completeVec, ])
}
merge2 <- completeFun(merge2, "zebrafish")
## Then, get the zebrafish gene name (symbols) and add them as a column...
Any suggestions would be truly appreciated.
I am having a similar issue. I would appreciate it if a solution is given to this. Thank you!
I am having a similar issue. I would appreciate it if a solution is given to this. Thank you!
Hi, I face the same problem. Do you have some solution for this? Really appreciate you if you have some comments. Thanks in advance,
Hi Frucelee, I have solved my own issue. I found that the problem was that I had incorrectly formatted my initial loom file. In your loom file, you can check to ensure that the Var and Obs are correctly set. You can also check that you're using the right organism and database. Best wishes.
Hi everyone, did someone manage to solve this issue? @Arinze-BioX it would be great to share on how to check on the right format of the loom file. thanks
Hi @MubasherMohammed , I think what I did was basic wrangling to check that I have the right number of genes and cells, and that I haven't wrongly transformed my expression matrix. That is, if I remember correctly, the Var are the genes and the Obs are the Cells. Then I checked that I used the right specie, genome build (e.g. mm9 vs mm10) and databases based on the explanations here: http://htmlpreview.github.io/?https://github.com/aertslab/SCENIC/blob/master/inst/doc/SCENIC_Setup.html
You can choose the right database by browsing this list of databases: https://resources.aertslab.org/cistarget/
I have moved on from this project, so I vaguely remember the details, and I hope my brief explanation here helps.
If you keep having issues after trying this, please share details of your issue and I can dig deeper and try to help.
Thank you @Arinze-BioX for your explanation. I also went through check for my files for pyscenic ctx regulons enrichment seems all good. I open this issue #389 hence and hope to get some insights on fixing it Thanks again
Hi @MubasherMohammed , I think what I did was basic wrangling to check that I have the right number of genes and cells, and that I haven't wrongly transformed my expression matrix. That is, if I remember correctly, the Var are the genes and the Obs are the Cells. Then I checked that I used the right specie, genome build (e.g. mm9 vs mm10) and databases based on the explanations here: http://htmlpreview.github.io/?https://github.com/aertslab/SCENIC/blob/master/inst/doc/SCENIC_Setup.html
You can choose the right database by browsing this list of databases: https://resources.aertslab.org/cistarget/
I have moved on from this project, so I vaguely remember the details, and I hope my brief explanation here helps.
If you keep having issues after trying this, please share details of your issue and I can dig deeper and try to help.
@Arinze-BioX @stanaka6 @MubasherMohammed Would you please explain how did you solve your problem regarding pyscenic ctx step empty output issue? I have raised one issue here: https://github.com/aertslab/pySCENIC/issues/407#issue-1301714888
Hi pySCENIC team,
Thank you for providing a great tool. I am running pyscenic using my original zebrafish data. However, I got an empty file output (reg.csv) from
pyscenic ctx
, which is a very similar situation in #177. After execution of thepyscenic ctx
function, a lot of warnings were shown up like:or
There is no error message, and the empty output reg.csv means this:
Could you please provide me a solution to fix this issue?
More details are described below.
I ran this code:
zebrafish cisTarget databases were created by following this repo using zebrafish 5'UTR's up and downstream 10k bp genomic regions and motif information from JASPAR. The feather file I used for
pyscenic ctx
is zf1.genes_vs_motifs.rankings.feather.The head of the adjacencies matrix (adj.tsv):
According to this comment, I have created zebrafish Motif2TF database by the following step: 1. find zebrafish orthologs by orthofinder of mouse genes; 2. substitute mouse genes to zebrafish ortholog genes in motifs-v9nr_clust-nr.mgi-m0.001-o0.0.tbl. The created db looks like this:
pyscenic version 0.11.2 running in conda environment
Any suggestions or comments would be really appreciated.