Open jesswhitts opened 2 months ago
I found a workaround to this, the problem was due to the ftp download stalling, for some reason it was getting stuck without timing out or failing.
I edited the code in 'data_wrangling/gene_search_space.py' to use http instead by adding this after line 169: ncbi_assembly_report_url = ncbi_assembly_report_url.replace('ftp://', 'http://')
There's probably a more sensible way to implement this, but this runs fine for me now
Hello,
I'm using the development version of SCENIC+.
When running the snakemake pipeline, it seems to get stuck on 'download_genome_annotations'...
Contents of my Snakemake folder: -rw-r-----. 1 jwhittle stemcell 19698468969 Apr 18 12:38 ACC_GEX.h5mu drwxr-x---. 2 jwhittle stemcell 4096 Apr 15 14:18 config -rw-r-----. 1 jwhittle stemcell 6736337146 Apr 18 12:18 ctx_results.hdf5 -rw-r-----. 1 jwhittle stemcell 14855562 Apr 18 12:18 ctx_results.html -rw-r-----. 1 jwhittle stemcell 349 Apr 15 14:28 run_pipeline.sh -rw-r-----. 1 jwhittle stemcell 2450 Apr 18 12:38 scplus.3295014.err -rw-r-----. 1 jwhittle stemcell 84872 Apr 18 12:28 scplus.3295014.out drwxr-x---. 2 jwhittle stemcell 4096 Apr 15 14:18 workflow
Output file: 2024-04-18 14:52:08,976 Download gene annotation INFO Using genome: GRCh38.p12 2024-04-18 14:52:08,987 Download gene annotation INFO Found corresponding genome Id 51 on NCBI 2024-04-18 14:52:09,493 Download gene annotation INFO Found corresponding assembly Id 11968211 on NCBI 2024-04-18 14:52:09,997 Download gene annotation INFO Downloading assembly information from: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.40_GRCh38.p14/GCF_000001405.40_GRCh38.p14_assembly_report.txt
Error file: Assuming unrestricted shared filesystem usage for local execution. Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 48 Rules claiming more threads will be scaled down. Job stats: job count
AUCell_direct 1 AUCell_extended 1 all 1 download_genome_annotations 1 eGRN_direct 1 eGRN_extended 1 get_search_space 1 motif_enrichment_dem 1 prepare_menr 1 region_to_gene 1 scplus_mudata 1 tf_to_gene 1 total 12
Select jobs to execute... Execute 1 jobs...
[Thu Apr 18 14:51:02 2024] localrule download_genome_annotations: output: genome_annotation.tsv, chromsizes.tsv jobid: 8 reason: Missing output files: genome_annotation.tsv, chromsizes.tsv resources: tmpdir=/tmp
Any thoughts on the cause?
Many thanks, Jess