aertslab / create_cisTarget_databases

Create cisTarget databases
37 stars 8 forks source link

Try to create the chicken cisdatabase #50

Open LJZYaaa opened 1 week ago

LJZYaaa commented 1 week ago

Hello,I'm trying to create the chicken's cistarget database for my single-cell research analysis, and already creating the GRCg6a.regions_vs_motifs.rankings.feather throught EPD's bed and v10_clust motifs. But when i try to run the pyscenic ctx using the feather file and motifs-v10-nr.chicken-m0.00001-o0.0.tbl , it report that {A743906C-64B5-4897-BF20-BEFF1F2209C7} {A743906C-64B5-4897-BF20-BEFF1F2209C7} {A743906C-64B5-4897-BF20-BEFF1F2209C7} Can you make some advice for that wrong ?

ghuls commented 6 days ago

Never seen that error myself, but I assume the error comes from a mismatch between gene/region names in the database and the ones you are requesting in pySCENIC (which seems to be None according to the error message). For pySCENIC you normally make a gene cisTarget database and not a region cisTarget database.

LJZYaaa commented 4 days ago

Never seen that error myself, but I assume the error comes from a mismatch between gene/region names in the database and the ones you are requesting in pySCENIC (which seems to be None according to the error message). For pySCENIC you normally make a gene cisTarget database and not a region cisTarget database. So glad to receive your reply, i have successed in making the genes vs motifs file by adding -g '|ENSGALG[0-9]+|ENSGALT[0-9.]+$' . But i still have some questions about this instruction. Because as you konw that many chicken's genes dont have the accurate gene symbol and only named as eg:ENSGALG00010029927, what should i do if i want to these genes also been included in the genes vs motifs.feather file? the below picture is the gene names of my TSS.fa file : image Anyway, thanks for your help, hope everyting go well with you! @ghuls

heckern commented 2 days ago

ENSEMBL switched from GRCg6a (galGal6) to GRCg7b (bGalGal1) in recent releases. ENSGALG00010029927, for example, is an ENSEMBL ID for the GRCg7b assembly. So, the coordinates would not match if you are using GRCg6a.

They still provide gene annotations for GRCg6a on their ftp server (at least for ENSEMBL 108). But many gene names are missing. We updated the protein-coding gene names for the GRCg6a version (ENSEMBL 108) for one of our recent data sets, in case that is helpful:

https://ftp.ncbi.nlm.nih.gov/geo/series/GSE262nnn/GSE262321/suppl/GSE262321%5Fgex%5Fgenes%2Etsv%2Egz

LJZYaaa commented 1 day ago

I have made two feather file which one is made from 500bp around TSS the other is made from 10KB around TSS , and find that both of them are not ideal ,only 6 to 7 TFs are detected in the results. What's more, the detected TSs are totally different! feel sad about results, EMMMMMMMMMM As your resources has a chicken's tbl , i wonder whehter your them have ever tried to make a chicken's cisdatabases? TAT Antway, thanks for your reply, hope you have a good day ^-^