aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
186 stars 29 forks source link

Where is genome_annotation.tsv ? #429

Closed Zhangruiqi111 closed 4 months ago

Zhangruiqi111 commented 4 months ago

Hello,When I run this command,I don‘t know where is genome_annotation.tsv ? Can I download it manually? scenicplus grn_inference motif_enrichment_dem \ --region_set_folder 'outs/region_sets' \ --dem_db_fname '10x_brain_1kb_bg_with_mask.regions_vs_motifs.scores.feather' \ --output_fname_dem_result "dem_results.hdf5" \ --temp_dir "" \ --species "hsapiens" \ --fraction_overlap_w_dem_database 0.4 \ --max_bg_regions 500 \ --balance_number_of_promoters \ --genome_annotation "genome_annotation.tsv"\ --promoter_space 1_000 \ --adjpval_thr 0.05 \ --log2fc_thr 1.0 \ --mean_fg_thr 0.0 \ --motif_hit_thr 3.0 \ --path_to_motif_annotations 'aertslab_motif_colleciton/v10nr_clust_public/snapshots/motifs-v10-nr.mgi-m0.00001-o0.0.tbl' \ --annotation_version 'v10nr_clust' \ --motif_similarity_fdr 0.001 \ --orthologous_identity_threshold 0.0 \ --annotations_to_use "Direct_annot Orthology_annot" \ --write_html \ --output_fname_dem_html "dem_results.html"\ --seed 666

SeppeDeWinter commented 4 months ago

Hi @Zhangruiqi111

You can download it using the following command:


scenicplus prepare_data download_genome_annotations
usage: scenicplus prepare_data download_genome_annotations [-h] --species SPECIES --genome_annotation_out_fname
                                                           GENOME_ANNOTATION_OUT_FNAME --chromsizes_out_fname CHROMSIZES_OUT_FNAME
                                                           [--biomart_host BIOMART_HOST] [--do_not_use_ucsc_chromosome_style]
scenicplus prepare_data download_genome_annotations: error: the following arguments are required: --species, --genome_annotation_out_fname, --chromsizes_out_fname

Best,

Seppe

Zhangruiqi111 commented 4 months ago

Thank you for your reply!I want to know what options can be followed by the parameter "--species" ?

SeppeDeWinter commented 4 months ago

For the motif enrichment step it should be "homo_spapiens" and for the download_genome_annotations it should be "hsapiens". Sorry for the inconsistencies.

Best,

Seppe

Zhangruiqi111 commented 4 months ago

Ok, thank you very much!

Best,

Ruiqi Zhang

yojetsharma commented 1 month ago
(scenicplus) [yojetsharma@pakeeza outs]$ scenicplus prepare_data download_genome_annotations \
> --species "hsapiens" \
> --genome_annotation_out_fname "/home/praghu/yojetsharma/pycistopic_final/outs/genome_annotation.tsv" \
> --chromsizes_out_fname "/home/praghu/yojetsharma/pycistopic_final/outs/chromsizes.tsv"
2024-10-17 11:29:33,134 Download gene annotation INFO     Using genome: GRCh38.p14
Could not find IdList on https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=genome&term=GRCh38.p14
Returning gene annotation without subestting for assembled chromosomesand converting to UCSC style. Please make sure that the chromosome namesin the returned object match with the chromosome names in the scplus_obj.Chromosome sizes will not be returned
2024-10-17 11:29:34,268 SCENIC+      INFO     Chrosomome sizes was not found, please provide this information manually.
2024-10-17 11:29:34,269 SCENIC+      INFO     Saving genome annotation to: /home/praghu/yojetsharma/pycistopic_final/outs/genome_annotation.tsv

I manually saved the genome_annotation but it gives this chromosome sizes not found error. Then I downloaded the hg38.chrom.sizes file as done in the pycistopic tutorial saved it to the outs/ folder and ran snakemake again. But the pipeline gets stopped.

(scenicplus) [yojetsharma@pakeeza Snakemake]$ snakemake --cores 20
Assuming unrestricted shared filesystem usage for local execution.
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 20
Rules claiming more threads will be scaled down.
Job stats:
job                            count
---------------------------  -------
AUCell_direct                      1
AUCell_extended                    1
all                                1
download_genome_annotations        1
eGRN_direct                        1
eGRN_extended                      1
get_search_space                   1
motif_enrichment_dem               1
prepare_menr                       1
region_to_gene                     1
scplus_mudata                      1
tf_to_gene                         1
total                             12

Select jobs to execute...
Execute 1 jobs...

[Thu Oct 17 11:45:29 2024]
localrule download_genome_annotations:
    output: /home/praghu/yojetsharma/pycistopic_final/outs/genome_annotation.tsv, /home/praghu/yojetsharma/pycistopic_final/outs/chromsizes
    jobid: 8
    reason: Missing output files: /home/praghu/yojetsharma/pycistopic_final/outs/chromsizes
    resources: tmpdir=/tmp

2024-10-17 11:47:11,805 Download gene annotation INFO     Using genome: GRCh38.p14
Could not find IdList on https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=genome&term=GRCh38.p14
Returning gene annotation without subestting for assembled chromosomesand converting to UCSC style. Please make sure that the chromosome namesin the returned object match with the chromosome names in the scplus_obj.Chromosome sizes will not be returned
2024-10-17 11:47:11,816 SCENIC+      INFO     Chrosomome sizes was not found, please provide this information manually.
2024-10-17 11:47:11,816 SCENIC+      INFO     Saving genome annotation to: /home/praghu/yojetsharma/pycistopic_final/outs/genome_annotation.tsv
Waiting at most 5 seconds for missing files.
MissingOutputException in rule download_genome_annotations in file /ncbs_gs/nlsas_data/usershares/praghu/yojetsharma/pycistopic_final/scplus_pipeline/Snakemake/workflow/Snakefile, line 221:
Job 8  completed successfully, but some output files are missing. Missing files after 5 seconds. This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait:
/home/praghu/yojetsharma/pycistopic_final/outs/chromsizes
Removing output files of failed job download_genome_annotations since they might be corrupted:
/home/praghu/yojetsharma/pycistopic_final/outs/genome_annotation.tsv
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-10-17T114523.626937.snakemake.log
WorkflowError:
At least one job did not complete successfully.