aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
163 stars 27 forks source link

No hg19 scores/dem database #333

Closed mwetzel7 closed 2 months ago

mwetzel7 commented 3 months ago

Hello,

My data were aligned with the hg19 reference genome and I'm encountering an issue running the "run_pycistarget" function during pre-processing where there doesn't seem to be a 'scores' database on the https://resources.aertslab.org/cistarget/ site, only one for 'rankings'.

Is there a way to run without this or are there any pre-computed dem/score databases available for hg19?

Thanks for your help!

SeppeDeWinter commented 3 months ago

Hi @mwetzel7

These files are still using the h19 reference genome: https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg19/refseq_r45/mc9nr/region_based/.

However, they are outdated and not using the new motif collection. So I would not recommend to use them.

You can either generate your own database, see this tutorial: (https://scenicplus.readthedocs.io/en/development/human_cerebellum_ctx_db.html#Creating-custom-cistarget-database)

Or remap your data to hg38 (I would strongly recommend this).

All the best,

Seppe

w1973145618 commented 2 months ago

Hi @mwetzel7

These files are still using the h19 reference genome: https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg19/refseq_r45/mc9nr/region_based/.

However, they are outdated and not using the new motif collection. So I would not recommend to use them.

You can either generate your own database, see this tutorial: (https://scenicplus.readthedocs.io/en/development/human_cerebellum_ctx_db.html#Creating-custom-cistarget-database)

Or remap your data to hg38 (I would strongly recommend this).

All the best,

Seppe

Dear Seppe: I encountered difficulties when trying to establish my own database.

Describe the bug When following your tutorial, I encountered a problem at step Prepare fasta from consensus regions

To Reproduce REGION_BED="/home/taxue/mywork/scenicplus/pbmc_tutorial/scATAC/consensus_peak_calling/consensus_regions.bed" GENOME_FASTA="/home/taxue/mywork/pycistarget/geneo_file/hg38.fa" CHROMSIZES="/home/taxue/mywork/pycistarget/geneo_file/hg38.chrom.sizes" DATABASE_PREFIX="10x_brain_1kb_bg_with_mask" SCRIPT_DIR="/home/taxue/mywork/pycistarget/create_cisTarget_databases"

${SCRIPT_DIR}/create_fasta_with_padded_bg_from_bed.sh \ ${GENOME_FASTA} \ ${CHROMSIZES} \ ${REGION_BED} \ hg38.10x_brain.with_1kb_bg_padding.fa \ 1000 \ yes Error output WARNING. chromosome () was not found in the FASTA file. Skipping. WARNING. chromosome () was not found in the FASTA file. Skipping. WARNING. chromosome () was not found in the FASTA file. Skipping. WARNING. chromosome () was not found in the FASTA file. Skipping. WARNING. chromosome () was not found in the FASTA file. Skipping. WARNING. chromosome () was not found in the FASTA file. Skipping. WARNING. chromosome () was not found in the FASTA file. Skipping. WARNING. chromosome () was not found in the FASTA file. Skipping.

SeppeDeWinter commented 2 months ago

Hi @w1973145618

Does the script still produce an output fasta file? Also, your genome GENOME_FASTA variable is set to the hg38 genome, I thought you were using hg19 peaks? Same for the CHROMSIZES variable.

Could you show the head of your regions bed file?


cat /home/taxue/mywork/scenicplus/pbmc_tutorial/scATAC/consensus_peak_calling/consensus_regions.bed | head

All the best,

Seppe

w1973145618 commented 2 months ago

Hi @w1973145618

Does the script still produce an output fasta file? Also, your genome GENOME_FASTA variable is set to the hg38 genome, I thought you were using hg19 peaks? Same for the CHROMSIZES variable.

Could you show the head of your regions bed file?

cat /home/taxue/mywork/scenicplus/pbmc_tutorial/scATAC/consensus_peak_calling/consensus_regions.bed | head

All the best,

Seppe

Dear Seppe: The script still produce an output fasta file, but it is empty. myfa

my regions bed file:

(create_cistarget_databases310) taxue@myUbuntu:~/mywork/pycistarget$ cat /home/taxue/mywork/scenicplus/pbmc_tutorial/scATAC/consensus_peak_calling/consensus_regions.bed | head GL000194.1 101176 101676 NK_cells_peak_1 31.91296638220936 . GL000194.1 101909 102409 FCGR3A+_Monocytes_peak_1 2.662220315037169 . GL000194.1 114708 115208 CD14+_Monocytes_peak_1,CD4_T_cells_peak_1 4.617705353871189 . GL000195.1 22453 22953 B_cells_peak_1 1.736475439334798 . GL000195.1 32233 32733 B_cells_peak_3a,B_cells_peak_3b,CD4_T_cells_peak_3a,CD4_T_cells_peak_3b,CD4_T_cells_peak_3c,CD4_T_cells_peak_3d,FCGR3A+_Monocytes_peak_3a,FCGR3A+_Monocytes_peak_3b,CD14+_Monocytes_peak_3a,CD14+_Monocytes_peak_3b,CD14+_Monocytes_peak_3c,NK_cells_peak_2,CD8_T_cells_peak_2,Dendritic_cells_peak_2 47.26958178417477 . GL000195.1 30601 31101 Dendritic_cells_peak_1,B_cells_peak_2,CD14+_Monocytes_peak_2,CD4_T_cells_peak_2a,CD4_T_cells_peak_2b,FCGR3A+_Monocytes_peak_2,CD8_T_cells_peak_1 70.56189358038134 . GL000205.2 64559 65059 B_cells_peak_5 2.7783607029356765 . GL000205.2 69067 69567 B_cells_peak_6b 13.544508426811424 . GL000205.2 88787 89287 CD4_T_cells_peak_5 27.476430976399165 . GL000205.2 67941 68441 B_cells_peak_6a,Dendritic_cells_peak_4,CD14+_Monocytes_peak_5 9.203319828474429 .

mybed

my all code to create fasta from bed: ` conda activate create_cistarget_databases310

REGION_BED="/home/taxue/mywork/scenicplus/pbmc_tutorial/scATAC/consensus_peak_calling/consensus_regions.bed" GENOME_FASTA="/home/taxue/mywork/pycistarget/geneo_file/hg38.fa" CHROMSIZES="/home/taxue/mywork/pycistarget/geneo_file/hg38.chrom.sizes" DATABASE_PREFIX="10x_brain_1kb_bg_with_mask" SCRIPT_DIR="/home/taxue/anaconda3/Git/create_cisTarget_databases"

${SCRIPT_DIR}/create_fasta_with_padded_bg_from_bed.sh \ ${GENOME_FASTA} \ ${CHROMSIZES} \ ${REGION_BED} \ hg38.10x_brain.with_1kb_bg_padding.fa \ 1000 \ yes`

SeppeDeWinter commented 2 months ago

Hi @mwetzel7

Does the command produce any other output besides this warning message? WARNING. chromosome () was not found in the FASTA file. Skipping.

Are you able to get DNA sequences using bedtools


bedtools getfasta -fi ${GENOME_FASTA} -bed ${REGION_BED}

All the best,

Seppe

w1973145618 commented 2 months ago

Hi @mwetzel7

Does the command produce any other output besides this warning message? WARNING. chromosome () was not found in the FASTA file. Skipping.

Are you able to get DNA sequences using bedtools

bedtools getfasta -fi ${GENOME_FASTA} -bed ${REGION_BED}

All the best,

Seppe

Dear Seppe: The command produce a file besides this warning message: my other file

The content is as follows. my other neirong

I can get DNA sequences using bedtools my use bedtools

ghuls commented 2 months ago

Can it be that you have regions without chromosome name?

cut -f 1 ${REGION_BED} | sort | uniq -c
w1973145618 commented 2 months ago

Hi @mwetzel7

Does the command produce any other output besides this warning message? WARNING. chromosome () was not found in the FASTA file. Skipping.

Are you able to get DNA sequences using bedtools

bedtools getfasta -fi ${GENOME_FASTA} -bed ${REGION_BED}

All the best,

Seppe

Dear Seppe: Today, I followed your new tutorial by using scenicplus-development, and I successfully ran the code.

conda activate scenicplus311

REGION_BED="//home/taxue/mywork/scenicplus/outs/consensus_peak_calling/consensus_regions.bed"
GENOME_FASTA="/home/taxue/mywork/scenicplus/pycistarget/geneo_file/hg38.fa"
CHROMSIZES="/home/taxue/mywork/scenicplus/pycistarget/geneo_file/hg38.chrom.sizes"
DATABASE_PREFIX="10x_brain_1kb_bg_with_mask"
SCRIPT_DIR="/home/taxue/anaconda3/Git/create_cisTarget_databases"

${SCRIPT_DIR}/create_fasta_with_padded_bg_from_bed.sh \
        ${GENOME_FASTA} \
        ${CHROMSIZES} \
        ${REGION_BED} \
        hg38.10x_brain.with_1kb_bg_padding.fa \
        1000 \
        yes

my out file is fellow: 83f936fa769e35c6a01ee95fa8a37829

Thank you very much. LOVE

SeppeDeWinter commented 2 months ago

Hi @mwetzel7

Glad to hear that it works using the new tutorial.

Good luck with the analyses.

Seppe