Closed mwetzel7 closed 2 months ago
Hi @mwetzel7
These files are still using the h19 reference genome: https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg19/refseq_r45/mc9nr/region_based/.
However, they are outdated and not using the new motif collection. So I would not recommend to use them.
You can either generate your own database, see this tutorial: (https://scenicplus.readthedocs.io/en/development/human_cerebellum_ctx_db.html#Creating-custom-cistarget-database)
Or remap your data to hg38 (I would strongly recommend this).
All the best,
Seppe
Hi @mwetzel7
These files are still using the h19 reference genome: https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg19/refseq_r45/mc9nr/region_based/.
However, they are outdated and not using the new motif collection. So I would not recommend to use them.
You can either generate your own database, see this tutorial: (https://scenicplus.readthedocs.io/en/development/human_cerebellum_ctx_db.html#Creating-custom-cistarget-database)
Or remap your data to hg38 (I would strongly recommend this).
All the best,
Seppe
Dear Seppe: I encountered difficulties when trying to establish my own database.
Describe the bug When following your tutorial, I encountered a problem at step Prepare fasta from consensus regions
To Reproduce REGION_BED="/home/taxue/mywork/scenicplus/pbmc_tutorial/scATAC/consensus_peak_calling/consensus_regions.bed" GENOME_FASTA="/home/taxue/mywork/pycistarget/geneo_file/hg38.fa" CHROMSIZES="/home/taxue/mywork/pycistarget/geneo_file/hg38.chrom.sizes" DATABASE_PREFIX="10x_brain_1kb_bg_with_mask" SCRIPT_DIR="/home/taxue/mywork/pycistarget/create_cisTarget_databases"
${SCRIPT_DIR}/create_fasta_with_padded_bg_from_bed.sh \ ${GENOME_FASTA} \ ${CHROMSIZES} \ ${REGION_BED} \ hg38.10x_brain.with_1kb_bg_padding.fa \ 1000 \ yes Error output WARNING. chromosome () was not found in the FASTA file. Skipping. WARNING. chromosome () was not found in the FASTA file. Skipping. WARNING. chromosome () was not found in the FASTA file. Skipping. WARNING. chromosome () was not found in the FASTA file. Skipping. WARNING. chromosome () was not found in the FASTA file. Skipping. WARNING. chromosome () was not found in the FASTA file. Skipping. WARNING. chromosome () was not found in the FASTA file. Skipping. WARNING. chromosome () was not found in the FASTA file. Skipping.
Hi @w1973145618
Does the script still produce an output fasta file? Also, your genome GENOME_FASTA variable is set to the hg38 genome, I thought you were using hg19 peaks? Same for the CHROMSIZES variable.
Could you show the head of your regions bed file?
cat /home/taxue/mywork/scenicplus/pbmc_tutorial/scATAC/consensus_peak_calling/consensus_regions.bed | head
All the best,
Seppe
Hi @w1973145618
Does the script still produce an output fasta file? Also, your genome GENOME_FASTA variable is set to the hg38 genome, I thought you were using hg19 peaks? Same for the CHROMSIZES variable.
Could you show the head of your regions bed file?
cat /home/taxue/mywork/scenicplus/pbmc_tutorial/scATAC/consensus_peak_calling/consensus_regions.bed | head
All the best,
Seppe
Dear Seppe:
The script still produce an output fasta file, but it is empty.
my regions bed file:
(create_cistarget_databases310) taxue@myUbuntu:~/mywork/pycistarget$ cat /home/taxue/mywork/scenicplus/pbmc_tutorial/scATAC/consensus_peak_calling/consensus_regions.bed | head GL000194.1 101176 101676 NK_cells_peak_1 31.91296638220936 . GL000194.1 101909 102409 FCGR3A+_Monocytes_peak_1 2.662220315037169 . GL000194.1 114708 115208 CD14+_Monocytes_peak_1,CD4_T_cells_peak_1 4.617705353871189 . GL000195.1 22453 22953 B_cells_peak_1 1.736475439334798 . GL000195.1 32233 32733 B_cells_peak_3a,B_cells_peak_3b,CD4_T_cells_peak_3a,CD4_T_cells_peak_3b,CD4_T_cells_peak_3c,CD4_T_cells_peak_3d,FCGR3A+_Monocytes_peak_3a,FCGR3A+_Monocytes_peak_3b,CD14+_Monocytes_peak_3a,CD14+_Monocytes_peak_3b,CD14+_Monocytes_peak_3c,NK_cells_peak_2,CD8_T_cells_peak_2,Dendritic_cells_peak_2 47.26958178417477 . GL000195.1 30601 31101 Dendritic_cells_peak_1,B_cells_peak_2,CD14+_Monocytes_peak_2,CD4_T_cells_peak_2a,CD4_T_cells_peak_2b,FCGR3A+_Monocytes_peak_2,CD8_T_cells_peak_1 70.56189358038134 . GL000205.2 64559 65059 B_cells_peak_5 2.7783607029356765 . GL000205.2 69067 69567 B_cells_peak_6b 13.544508426811424 . GL000205.2 88787 89287 CD4_T_cells_peak_5 27.476430976399165 . GL000205.2 67941 68441 B_cells_peak_6a,Dendritic_cells_peak_4,CD14+_Monocytes_peak_5 9.203319828474429 .
my all code to create fasta from bed: ` conda activate create_cistarget_databases310
REGION_BED="/home/taxue/mywork/scenicplus/pbmc_tutorial/scATAC/consensus_peak_calling/consensus_regions.bed" GENOME_FASTA="/home/taxue/mywork/pycistarget/geneo_file/hg38.fa" CHROMSIZES="/home/taxue/mywork/pycistarget/geneo_file/hg38.chrom.sizes" DATABASE_PREFIX="10x_brain_1kb_bg_with_mask" SCRIPT_DIR="/home/taxue/anaconda3/Git/create_cisTarget_databases"
${SCRIPT_DIR}/create_fasta_with_padded_bg_from_bed.sh \ ${GENOME_FASTA} \ ${CHROMSIZES} \ ${REGION_BED} \ hg38.10x_brain.with_1kb_bg_padding.fa \ 1000 \ yes`
Hi @mwetzel7
Does the command produce any other output besides this warning message?
WARNING. chromosome () was not found in the FASTA file. Skipping.
Are you able to get DNA sequences using bedtools
bedtools getfasta -fi ${GENOME_FASTA} -bed ${REGION_BED}
All the best,
Seppe
Hi @mwetzel7
Does the command produce any other output besides this warning message?
WARNING. chromosome () was not found in the FASTA file. Skipping.
Are you able to get DNA sequences using bedtools
bedtools getfasta -fi ${GENOME_FASTA} -bed ${REGION_BED}
All the best,
Seppe
Dear Seppe:
The command produce a file besides this warning message:
The content is as follows.
I can get DNA sequences using bedtools
Can it be that you have regions without chromosome name?
cut -f 1 ${REGION_BED} | sort | uniq -c
Hi @mwetzel7
Does the command produce any other output besides this warning message?
WARNING. chromosome () was not found in the FASTA file. Skipping.
Are you able to get DNA sequences using bedtools
bedtools getfasta -fi ${GENOME_FASTA} -bed ${REGION_BED}
All the best,
Seppe
Dear Seppe: Today, I followed your new tutorial by using scenicplus-development, and I successfully ran the code.
conda activate scenicplus311
REGION_BED="//home/taxue/mywork/scenicplus/outs/consensus_peak_calling/consensus_regions.bed"
GENOME_FASTA="/home/taxue/mywork/scenicplus/pycistarget/geneo_file/hg38.fa"
CHROMSIZES="/home/taxue/mywork/scenicplus/pycistarget/geneo_file/hg38.chrom.sizes"
DATABASE_PREFIX="10x_brain_1kb_bg_with_mask"
SCRIPT_DIR="/home/taxue/anaconda3/Git/create_cisTarget_databases"
${SCRIPT_DIR}/create_fasta_with_padded_bg_from_bed.sh \
${GENOME_FASTA} \
${CHROMSIZES} \
${REGION_BED} \
hg38.10x_brain.with_1kb_bg_padding.fa \
1000 \
yes
my out file is fellow:
Thank you very much. LOVE
Hi @mwetzel7
Glad to hear that it works using the new tutorial.
Good luck with the analyses.
Seppe
Hello,
My data were aligned with the hg19 reference genome and I'm encountering an issue running the "run_pycistarget" function during pre-processing where there doesn't seem to be a 'scores' database on the https://resources.aertslab.org/cistarget/ site, only one for 'rankings'.
Is there a way to run without this or are there any pre-computed dem/score databases available for hg19?
Thanks for your help!