aertslab / create_cisTarget_databases

Create cisTarget databases
37 stars 8 forks source link

feather format is corrupted? #2

Closed tbrunetti closed 3 years ago

tbrunetti commented 3 years ago

I am trying to create feather databases for use in SCENIC on some custom ATAC-seq peaks. I have generated the feather files using your instructions, however, the feather files seem to be corrupted since R continues to abort using SCENIC and if I try to use the read_feather() function in R, it also crashes. However, no error is thrown upon completion of the feather database. I am doing something wrong? Here is an example of the last step and the printed output:

python3.8 ~/Downloads/software/create_cisTarget_databases/convert_motifs_or_tracks_vs_regions_or_genes_scores_to_rankings_cistarget_dbs.py -i /home/tonya/projects/Laurent_ATACseq/scenic_databases/test/test_NKT_ATACseq_Peaks.motifs_vs_regions.scores.feather

Reading cisTarget motifs vs regions scores db: "/home/tonya/projects/Laurent_ATACseq/scenic_databases/test/test_NKT_ATACseq_Peaks.motifs_vs_regions.scores.feather"
Reading cisTarget motifs vs regions scores db took: 0.015125 seconds

Create rankings from "/home/tonya/projects/Laurent_ATACseq/scenic_databases/test/test_NKT_ATACseq_Peaks.motifs_vs_regions.scores.feather" with random seed set to 10673978588653407811.
Creating cisTarget rankings db from cisTarget scores db took: 0.079009 seconds

Writing cisTarget regions vs motifs rankings db: "/home/tonya/projects/Laurent_ATACseq/scenic_databases/test/test_NKT_ATACseq_Peaks.motifs_vs_regions.rankings.feather"
Writing cisTarget regions vs motifs rankings db took: 0.012566 seconds

Convert motifs vs regions cisTarget rankings db to regions vs motifs cisTarget rankings db.
Writing cisTarget motifs vs regions rankings db: "/home/tonya/projects/Laurent_ATACseq/scenic_databases/test/test_NKT_ATACseq_Peaks.regions_vs_motifs.rankings.feather"
Writing cisTarget motifs vs regions rankings db took: 2.188603 seconds
tropfenameimer commented 3 years ago

hi @tbrunetti, there was an issue with RcisTarget (that is used by SCENIC). can you please install the latest version of RcisTarget: devtools::install_github("aertslab/RcisTarget") and try again to load your data base?

tbrunetti commented 3 years ago

hi @tropfenameimer,

I just installed the latest RcisTarget and it still doesn't work. Regardless of SCENIC, I can't even read the feather database using the library(feather), followed by read_feather(my_new_database.feather). It just kills and aborts the R session.

tbrunetti commented 3 years ago

@tropfenameimer,

After doing a little digging, it seems that the databases needs to be read using arrow::read_feather() not, feather::read_feather(). The database seems to load, however, SCENIC is still not working with the new database. Do I need to also upgrade to a dev version of SCENIC?

tbrunetti commented 3 years ago

OK, sorry for some many updates! The reason feather is not working is because it requires the developmental version of feather using devtools::install_github("wesm/feather/R") :

Now there is a new problem. SCENIC can load the feather databases but crashes because the column name "features" does not exist in the feather data bases generated. Instead it has the column name as "regions". Do i need to manipulate the feather database and change the column"regions" to "features"?

> initializeScenic(org = "mgi", dbDir = "/home/tonya/projects/Laurent_ATACseq/scenic_databases/test/", dbs = c("test_NKT_ATACseq_Peaks.regions_vs_motifs.rankings.feather",
+                                                   "test_NKT_ATACseq_Peaks.regions_vs_motifs.scores.feather","test_NKT_ATACseq_Peaks.motifs_vs_regions.rankings.feather",
+                                                   "test_NKT_ATACseq_Peaks.motifs_vs_regions.scores.feather"), nCores = 4)
Motif databases selected: 
  test_NKT_ATACseq_Peaks.regions_vs_motifs.rankings.feather 
  test_NKT_ATACseq_Peaks.regions_vs_motifs.scores.feather 
  test_NKT_ATACseq_Peaks.motifs_vs_regions.rankings.feather 
  test_NKT_ATACseq_Peaks.motifs_vs_regions.scores.feather
[1] "invalid first argument"
[1] "invalid first argument"
[1] "invalid first argument"
[1] "invalid first argument"
Error: Can't subset columns that don't exist.
x Column `features` doesn't exist.
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
In initializeScenic(org = "mgi", dbDir = "/home/tonya/projects/Laurent_ATACseq/scenic_databases/test/",  :
  It was not possible to load the following databses; check whether they are downloaded correctly: 
test_NKT_ATACseq_Peaks.regions_vs_motifs.rankings.feather
test_NKT_ATACseq_Peaks.regions_vs_motifs.scores.feather
test_NKT_ATACseq_Peaks.motifs_vs_regions.rankings.feather
test_NKT_ATACseq_Peaks.motifs_vs_regions.scores.feather

For example, these are the column names in the test feather bases I built (for rankings only. not scores):

a<-arrow::read_feather("/home/tonya/projects/Laurent_ATACseq/scenic_databases/test/test_NKT_ATACseq_Peaks.motifs_vs_regions.rankings.feather")
colnames(a)
 [1] "MA0001.2.jaspar" "MA0002.2.jaspar" "MA0003.4.jaspar" "MA0004.1.jaspar" "MA0005.2.jaspar" "MA0006.1.jaspar" "MA0007.3.jaspar" "MA0008.2.jaspar" "MA0009.2.jaspar"
[10] "MA0010.1.jaspar" "regions"    

I can either change the python code to do this I think or I think I can possibly load the feather, change the name, and save it as a feather?

tropfenameimer commented 3 years ago

there is indeed still an issue in SCENIC when the 'features' column that contains regions or motifs is not named 'features'. the easiest solution is, as you suggested, to change the name and save the data base again.

library(RcisTarget)

dfile <- "test_NKT_ATACseq_Peaks.regions_vs_motifs.rankings.feather"
db <- importRankings(dbfile, indexCol = "motifs")       
#(setting 'indexCol = "motifs"' makes sure the 'motifs' column is made the first column in the tibble)

db@rankings[1:5,1:5]
# A tibble: 5 x 5
#  motifs   NM_000014 NM_000015 NM_000016 NM_000017
 # <chr>        <int>     <int>     <int>     <int>
#1 MA0002.2     17566      9252      7167     33712
#2 MA0003.4     53007       188     12117     37106
#3 MA0004.1     14866     21506      2904     34609
#4 MA0006.1     50362     42697     27442     27185
#5 MA0007.3     31781     25661      4722     43563

names(db@rankings)[1] <- "features"
write_feather(db, "test_NKT_ATACseq_Peaks.regions_vs_motifs.rankings2.feather")

by the way, SCENIC doesn't need the '*scores.feather' and the 'motifs_vs_regions' data bases, only the ranking db test_NKT_ATACseq_Peaks.regions_vs_motifs.rankings.feather.

tbrunetti commented 3 years ago

Thank you so much for your help! I ran the following, however, should I be concerned about the warning it throws?

dbfile <- "/home/tonya/projects/Laurent_ATACseq/scenic_databases/test/test_NKT_ATACseq_Peaks.regions_vs_motifs.rankings.feather"
db <- importRankings(dbfile, indexCol = "motifs")       
names(db@rankings)[1] <- "features"
db@org <- "mgi"
db@genome <- "mm10"
write_feather(db@rankings, "/home/tonya/projects/Laurent_ATACseq/scenic_databases/test/databases/test_NKT_ATACseq_Peaks.regions_vs_motifs.rankings2.feather")
dbfile <- "/home/tonya/projects/Laurent_ATACseq/scenic_databases/test/databases/test_NKT_ATACseq_Peaks.regions_vs_motifs.rankings2.feather"
db <- importRankings(dbfile)   

setwd("/home/tonya/projects/Laurent_ATACseq/scenic_databases/test/")
scenicOptions <- initializeScenic(org = "mgi", dbDir = "databases", datasetTitle = "testing",  dbs = "test_NKT_ATACseq_Peaks.regions_vs_motifs.rankings2.feather", nCores = 4)
Motif databases selected: 
  test_NKT_ATACseq_Peaks.regions_vs_motifs.rankings2.feather
[1] "invalid first argument"
Missing annotations for: 
     test_NKT_ATACseq_Peaks.regions_vs_motifs.rankings2.feather
Warning message:
In initializeScenic(org = "mgi", dbDir = "databases", datasetTitle = "testing",  :
  It was not possible to load the following databses; check whether they are downloaded correctly: 
test_NKT_ATACseq_Peaks.regions_vs_motifs.rankings2.feather
tropfenameimer commented 3 years ago

hi @tbrunetti, the feather issues in the R-version of SCENIC should be fixed now. please install the latest version from github.

tbrunetti commented 3 years ago

hi @tbrunetti, the feather issues in the R-version of SCENIC should be fixed now. please install the latest version from github.

Great, thank you! I am working on testing it today and I will report back. Thank you so much for your quick response to this, it is an excellent piece of software!

tbrunetti commented 3 years ago

Quick update, it seems to be working! I just need to annotate the peaks, which I will do now. Thank you again for all your help. I will close this issue :)