Open adonatiucsd opened 2 years ago
Hi Antoine,
The ResolvePackageNotFound
is a known issue if you're trying to create the environment on Mac using an environment.yml
file created on a different OS, in this case Linux. I suggest running this pipeline either in Linux or try running without creating the scTAR_cellranger environment.
It looks like the error you're getting is related to your /Users/perrylabmac/Musca_ref_genome_Cellranger
folder. Can you make sure this folder contains genes/genes.gtf
?
Best, Michael
Hi Michael,
It turns out the genes.gtf was zipped; after unzipping and rerunning the snakemake I got this error:
Error in rule convertToRefFlat: jobid: 9 output: /Users/perrylabmac/Musca_ref_genome_Cellranger/genes/genes.refFlat shell:
/Users/perrylabmac/miniconda3/pkgs/ucsc-gtftogenepred-377-h516baf0_3/bin/gtfToGenePred -genePredExt -geneNameAsName2 /Users/perrylabmac/Musca_ref_genome_Cellranger/genes/genes.gtf refFlat.tmp
paste <(cut -f 12 refFlat.tmp) <(cut -f 1-10 refFlat.tmp) > /Users/perrylabmac/Musca_ref_genome_Cellranger/genes/genes.refFlat
rm refFlat.tmp
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
thanks a lot for your help,
Antoine
Can you share the error message you get when you run the following commands on the command line? The snakemake log file does not provide the error message info.
/Users/perrylabmac/miniconda3/pkgs/ucsc-gtftogenepred-377-h516baf0_3/bin/gtfToGenePred -genePredExt -geneNameAsName2 /Users/perrylabmac/Musca_ref_genome_Cellranger/genes/genes.gtf refFlat.tmp
paste <(cut -f 12 refFlat.tmp) <(cut -f 1-10 refFlat.tmp) > /Users/perrylabmac/Musca_ref_genome_Cellranger/genes/genes.refFlat
rm refFlat.tmp
Thanks, Michael
I get this (snakemake) Perrys-MacBook-Pro:log perrylabmac$ /Users/perrylabmac/miniconda3/pkgs/ucsc-gtftogenepred-377-h516baf0_3/bin/gtfToGenePred -genePredExt -geneNameAsName2 /Users/perrylabmac/Musca_ref_genome_Cellranger/genes/genes.gtf refFlat.tmp dyld: Library not loaded: @rpath/libssl.1.1.dylib Referenced from: /Users/perrylabmac/miniconda3/pkgs/ucsc-gtftogenepred-377-h516baf0_3/bin/gtfToGenePred Reason: image not found Abort trap: 6 (snakemake) Perrys-MacBook-Pro:log perrylabmac$ paste <(cut -f 12 refFlat.tmp) <(cut -f 1-10 refFlat.tmp) > /Users/perrylabmac/Musca_ref_genome_Cellranger/genes/genes.refFlat cut: cut: refFlat.tmp: No such file or directory refFlat.tmp: No such file or directory (snakemake) Perrys-MacBook-Pro:log perrylabmac$ rm refFlat.tmp rm: refFlat.tmp: No such file or directory
Hmm, seems like an issue with the libssl library. Try conda installing with conda install libssh2
and re-runing.
Michael
So I installed libssh2 and rerun, here is the log file: Select jobs to execute...
[Fri Oct 8 10:07:32 2021] rule calcHMMbed: input: /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/outs/possorted_genome_bam.bam output: /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/TAR_reads.bed.gz jobid: 8 wildcards: DATADIR=/Users/perrylabmac/M1_Musca, sample=Lib_10X_Musca_larva_2019_02_25_M1_1sttry threads: 3 resources: tmpdir=/var/folders/kb/1djtgry52557wtjq8j15n_pm0000gn/T
[Fri Oct 8 12:18:55 2021] Finished job 8. 1 of 11 steps (9%) done Select jobs to execute...
[Fri Oct 8 12:18:55 2021] rule calcHMMrefFlat: input: /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/TAR_reads.bed.gz, /Users/perrylabmac/Musca_ref_genome_Ce$ output: /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/TAR_reads.bed.gz.withDir.genes.refFlat jobid: 7 wildcards: DATADIR=/Users/perrylabmac/M1_Musca, sample=Lib_10X_Musca_larva_2019_02_25_M1_1sttry resources: tmpdir=/var/folders/kb/1djtgry52557wtjq8j15n_pm0000gn/T
[Fri Oct 8 12:18:55 2021] Error in rule calcHMMrefFlat: jobid: 7 output: /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/TAR_reads.bed.gz.withDir.genes.refFlat shell:
Rscript scripts/generate_refFlat_script_both.R /Users/perrylabmac/Musca_ref_genome_Cellranger/genes/genes.refFlat /Users/per$
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /Users/perrylabmac/TAR-scRNA-seq/from_cellranger/.snakemake/log/2021-10-08T100732.520897.snakemake.log
and what I get during execution:
(snakemake) Perrys-MacBook-Pro:from_cellranger perrylabmac$ snakemake -j --rerun-incomplete The flag 'directory' used in rule getMatsSteps is only valid for outputs, not inputs. Building DAG of jobs... Using shell: /usr/local/bin/bash Provided cores: 8 Rules claiming more threads will be scaled down. Job stats: job count min threads max threads
HMM_refFlat_to_gtf_WITHDIR 1 1 1 all 1 1 1 calcHMMbed 1 3 3 calcHMMrefFlat 1 1 1 extract_HMM_expression_withDir 1 1 1 getDiffFeatures 1 1 1 getDiffRegionsToBlast 1 1 1 getDiffSeqsToBlastFa 1 1 1 labelDiffuTARs 1 1 1 ruleBlast 1 3 3 stage3_withDir 1 1 1 total 11 1 3
Select jobs to execute...
[Fri Oct 8 10:07:32 2021] rule calcHMMbed: input: /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/outs/possorted_genome_bam.bam output: /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/TAR_reads.bed.gz jobid: 8 wildcards: DATADIR=/Users/perrylabmac/M1_Musca, sample=Lib_10X_Musca_larva_2019_02_25_M1_1sttry threads: 3 resources: tmpdir=/var/folders/kb/1djtgry52557wtjq8j15n_pm0000gn/T
Number of aligned reads is 594024319 mkdir: /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/possorted_genome_bam_HMM_features: File exists tee: SingleCellHMMRun/Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/possorted_genome_bam_HMM_features.log: No such file or directory Path to SingleCellHMM.R /Users/perrylabmac/TAR-scRNA-seq/from_cellranger/scripts INPUT_BAM /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/outs/possorted_genome_bam.bam cellranger count folder /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR tmp folder /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/possorted_genome_bam_HMM_features number Of thread 3 minimum coverage 59 thresholded at 1 in 10000000 reads
Reads spanning over splicing junction will join HMM blocks To avoid that, split reads into small blocks before input to groHMM Spliting and sorting reads... awk: syntax error at source line 1 context is {print $0 >> >>> "chr"$ <<< 1".bed"} awk: illegal statement at source line 1 zcat: can't stat: possorted_genome_bam_split.sorted.bed.gz (possorted_genome_bam_split.sorted.bed.gz.Z): No such file or directory find: illegal option -- n usage: find [-H | -L | -P] [-EXdsx] [-f path] path ... [expression] find [-H | -L | -P] [-EXdsx] -f path [path ...] [expression]
Start to run groHMM on each individual chromosome... chr*.bed
Merging HMM blocks within 500bp... sort: No such file or directory mkdir: toremove: File exists
Calculating the coverage... zcat: can't stat: possorted_genome_bam_split.sorted.bed.gz (possorted_genome_bam_split.sorted.bed.gz.Z): No such file or directory
Filtering the HMM blocks by coverage...
zcat: can't stat: TAR_reads.bed.gz (TAR_reads.bed.gz.Z): No such file or directory
Link the final TAR_reads.bed.gz file to the working directory
Move intermediate files to /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/possorted_genome_bam_HMM_features/toremove ...
/Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/possorted_genome_bam_HMM_features/toremove can be deleted if no error message in SingleCellHMM_Run log file and all major chromosomes are present in the final TAR_reads.bed.gz file
gzip: toremove is a directory
done! gzip: possorted_genome_bam_split.sorted.bed.gz already has .gz suffix -- unchanged gzip: TAR_reads.bed.gz already has .gz suffix -- unchanged [Fri Oct 8 12:18:55 2021] Finished job 8. 1 of 11 steps (9%) done Select jobs to execute...
[Fri Oct 8 12:18:55 2021] rule calcHMMrefFlat: input: /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/TAR_reads.bed.gz, /Users/perrylabmac/Musca_ref_genome_Cellranger/genes/genes.refFlat output: /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/TAR_reads.bed.gz.withDir.genes.refFlat jobid: 7 wildcards: DATADIR=/Users/perrylabmac/M1_Musca, sample=Lib_10X_Musca_larva_2019_02_25_M1_1sttry resources: tmpdir=/var/folders/kb/1djtgry52557wtjq8j15n_pm0000gn/T
Error in read.table(file = file, header = header, sep = sep, quote = quote, : no lines available in input Calls: read.delim -> read.table Execution halted [Fri Oct 8 12:18:55 2021] Error in rule calcHMMrefFlat: jobid: 7 output: /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/TAR_reads.bed.gz.withDir.genes.refFlat shell:
Rscript scripts/generate_refFlat_script_both.R /Users/perrylabmac/Musca_ref_genome_Cellranger/genes/genes.refFlat /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/TAR_reads.bed.gz
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /Users/perrylabmac/TAR-scRNA-seq/from_cellranger/.snakemake/log/2021-10-08T100732.520897.snakemake.log
thanks!
Antoine
Hi Antoine,
I think this issue may again be related to your MacOS system. The awk
syntax seem to be a little different depending on the OS. I propose 2 temporary solutions you can try.
gawk
instead of awk
(command: alias awk=gawk
).scripts/SingleCellHMM_MW.bash
and change the line bedtools bamtobed -i ${INPUT_BAM} -split |LC_ALL=C sort -k1,1V -k2,2n --parallel=30| awk '{print $0}' | gzip > ${TMPDIR}/${PREFIX}_split.sorted.bed.gz
into bedtools bamtobed -i ${INPUT_BAM} -split |LC_ALL=C sort -k1,1V -k2,2n --parallel=30| awk '{print ($0)}' | gzip > ${TMPDIR}/${PREFIX}_split.sorted.bed.gz
where there is a parenthesis added to the awk print command.Let me know if that fixes it.
Michael
Hi Michael,
I tried both, I still get this error:
The flag 'directory' used in rule getMatsSteps is only valid for outputs, not inputs. Building DAG of jobs... Using shell: /usr/local/bin/bash Provided cores: 8 Rules claiming more threads will be scaled down. Job stats: job count min threads max threads
HMM_refFlat_to_gtf_WITHDIR 1 1 1 all 1 1 1 calcHMMrefFlat 1 1 1 extract_HMM_expression_withDir 1 1 1 getDiffFeatures 1 1 1 getDiffRegionsToBlast 1 1 1 getDiffSeqsToBlastFa 1 1 1 labelDiffuTARs 1 1 1 ruleBlast 1 3 3 stage3_withDir 1 1 1 total 10 1 3
Select jobs to execute...
[Tue Oct 12 09:22:09 2021] rule calcHMMrefFlat: input: /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/TAR_reads.bed.gz, /Users/perrylabmac/Musca_ref_genome_Cellranger/genes/genes.refFlat output: /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/TAR_reads.bed.gz.withDir.genes.refFlat jobid: 7 wildcards: DATADIR=/Users/perrylabmac/M1_Musca, sample=Lib_10X_Musca_larva_2019_02_25_M1_1sttry resources: tmpdir=/var/folders/kb/1djtgry52557wtjq8j15n_pm0000gn/T
Error in read.table(file = file, header = header, sep = sep, quote = quote, : no lines available in input Calls: read.delim -> read.table Execution halted [Tue Oct 12 09:22:09 2021] Error in rule calcHMMrefFlat: jobid: 7 output: /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/TAR_reads.bed.gz.withDir.genes.refFlat shell:
Rscript scripts/generate_refFlat_script_both.R /Users/perrylabmac/Musca_ref_genome_Cellranger/genes/genes.refFlat /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/TAR_reads.bed.gz
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /Users/perrylabmac/TAR-scRNA-seq/from_cellranger/.snakemake/log/2021-10-12T092208.761573.snakemake.log
Thanks!
Antoine
Can you share the content of your /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR
directory with a ls -hlt
command? It looks like you have an empty file somewhere along the TAR discover step.
Thanks, Michael
Hi!
(base) Perrys-MacBook-Pro:TAR perrylabmac$ ls -hlt total 0 lrwxr-xr-x 1 perrylabmac staff 123B Oct 12 15:23 TAR_reads.bed.gz -> /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/possorted_genome_bam_HMM_features/TAR_reads.bed.gz drwxr-xr-x 7 perrylabmac staff 224B Oct 12 15:23 possorted_genome_bam_HMM_features
and the content of the inner folder:
(base) Perrys-MacBook-Pro:possorted_genome_bam_HMM_features perrylabmac$ ls -hlt total 11993240 drwxr-xr-x 2 perrylabmac staff 64B Oct 12 15:23 toremove -rw-r--r-- 1 perrylabmac staff 20B Oct 12 15:23 TAR_reads.bed.gz -rw-r--r-- 1 perrylabmac staff 67B Oct 12 15:23 possorted_genome_bam_merge500.sorted.bed_count.gz -rw-r--r-- 1 perrylabmac staff 61B Oct 12 15:23 possorted_genome_bam_merge500.sorted.bed.gz -rw-r--r-- 1 perrylabmac staff 5.7G Oct 12 15:23 possorted_genome_bam_split.sorted.bed.gz
thanks,
Antoine
Can you make sure you have the necessary R packages, listed below, installed?
BiocManager rtracklayer groHMM Seurat, version >= 4.0 data.table dplyr stringr
Can you also share what is inside the toremove
folder with ls -lht
?
Thanks, michael
Hi,
So I already had these R packages installed via Rstudio but I reinstalled them within miniconda3 just to make sure; I get the same error message as before. The toremove folder is empty; nothing shows up when I type ls -lht
thanks,
Antoine
Can you share the SingleCellHMM_Run*.log file generated after implementing the awk fix? I would also recommend running the pipeline on a Linux system if possible.
Michael
ok I will try to run this on Linux. Where can I find the SingleCellHMM log file?
I made a change to scripts/SingleCellHMM_MW.bash
to fix a bug with the TAR discovery log file. If you update this file and re-run your pipeline, you should find the log file in /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/possorted_genome_bam_HMM_features/SingleCellHMM_Run.log
.
Michael
Here it is:
Path to SingleCellHMM.R /Users/perrylabmac/TAR-scRNA-seq/from_cellranger/scripts INPUT_BAM /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/outs/possorted_genome_bam.bam cellranger count folder /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR tmp folder /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/possorted_genome_bam_HMM_features number Of thread 10 minimum coverage 59 thresholded at 1 in 10000000 reads
Reads spanning over splicing junction will join HMM blocks To avoid that, split reads into small blocks before input to groHMM Spliting and sorting reads... zcat: can't stat: possorted_genome_bam_split.sorted.bed.gz (possorted_genome_bam_split.sorted.bed.gz.Z): No such file or directory awk: syntax error at source line 1 context is {print $0 >> >>> "chr"$ <<< 1".bed"} awk: illegal statement at source line 1 find: illegal option -- n usage: find [-H | -L | -P] [-EXdsx] [-f path] path ... [expression] find [-H | -L | -P] [-EXdsx] -f path [path ...] [expression]
Start to run groHMM on each individual chromosome... chr*.bed
Merging HMM blocks within 500bp... sort: No such file or directory
Calculating the coverage... zcat: can't stat: possorted_genome_bam_split.sorted.bed.gz (possorted_genome_bam_split.sorted.bed.gz.Z): No such file or directory
Filtering the HMM blocks by coverage...
zcat: can't stat: TAR_reads.bed.gz (TAR_reads.bed.gz.Z): No such file or directory
Link the final TAR_reads.bed.gz file to the working directory
Move intermediate files to /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/possorted_genome_bam_HMM_features/toremove ...
/Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/possorted_genome_bam_HMM_features/toremove can be deleted if no error message in SingleCellHMM_Run log file and all major chromosomes are present in the final TAR_reads.bed.gz file
Hi,
So I still haven't managed to get it to work on MacOS but I was able to run the pipeline on Linux. However when loading the TAR bed file on IGV I noticed that many scaffolds did not have anything in the TAR track; and from the groHMM log file it looks like the program did not go through all the 20487 scaffold of the Housefly genome:
Path to SingleCellHMM.R /home/adonati/TAR-scRNA-seq/from_cellranger/scripts INPUT_BAM /home/adonati/Desktop/M1/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/outs/possorted_genome_bam.bam cellranger count folder /home/adonati/Desktop/M1/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR tmp folder /home/adonati/Desktop/M1/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/possorted_genome_bam_HMM_features number Of thread 6 minimum coverage 59 thresholded at 1 in 10000000 reads
Reads spanning over splicing junction will join HMM blocks To avoid that, split reads into small blocks before input to groHMM Spliting and sorting reads... Finished splitting input bam. awk: cannot open "chrNW_004756140.1.bed" for output (Too many open files)
Start to run groHMM on each individual chromosome... chrNW_004754939.1.bed chrNW_004754940.1.bed chrNW_004754941.1.bed chrNW_004754943.1.bed chrNW_004754965.1.bed chrNW_004755053.1.bed chrNW_004755054.1.bed chrNW_004755111.1.bed chrNW_004755164.1.bed chrNW_004755198.1.bed chrNW_004755220.1.bed chrNW_004755231.1.bed chrNW_004755318.1.bed chrNW_004755333.1.bed chrNW_004755352.1.bed chrNW_004755353.1.bed chrNW_004755386.1.bed chrNW_004755398.1.bed chrNW_004755497.1.bed chrNW_004755531.1.bed chrNW_004755586.1.bed chrNW_004755622.1.bed chrNW_004755631.1.bed chrNW_004755653.1.bed chrNW_004755684.1.bed chrNW_004755784.1.bed chrNW_004755797.1.bed chrNW_004755819.1.bed chrNW_004755838.1.bed chrNW_004755853.1.bed chrNW_004755919.1.bed chrNW_004755924.1.bed chrNW_004755942.1.bed chrNW_004755953.1.bed chrNW_004756008.1.bed chrNW_004756052.1.bed chrNW_004756054.1.bed chrNW_004756076.1.bed chrNW_004756109.1.bed chrNW_004756120.1.bed
Merging HMM blocks within 500bp...
Calculating the coverage...
Filtering the HMM blocks by coverage...
NW_004754939.1 NW_004754940.1 NW_004754941.1 NW_004754943.1 NW_004754965.1 NW_004755053.1 NW_004755054.1 NW_004755164.1 NW_004755198.1 NW_004755220.1 NW_004755231.1 NW_004755353.1 NW_004755386.1 NW_004755398.1 NW_004755497.1 NW_004755531.1 NW_004755586.1 NW_004755631.1 NW_004755653.1 NW_004755797.1 NW_004755819.1 NW_004755838.1 NW_004755853.1 NW_004755919.1 NW_004755924.1 NW_004755942.1 NW_004755953.1 NW_004756008.1 NW_004756052.1 NW_004756054.1 NW_004756076.1 NW_004756109.1
Link the final TAR_reads.bed.gz file to the working directory
Move intermediate files to /home/adonati/Desktop/M1/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/possorted_genome_bam_HMM_features/toremove ...
/home/adonati/Desktop/M1/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/possorted_genome_bam_HMM_features/toremove can be deleted if no error message in SingleCellHMM_Run log file and all major chromosomes are present in the final TAR_reads.bed.gz file
gzip: possorted_genome_bam_split.sorted.bed.gz already has .gz suffix -- unchanged gzip: TAR_reads.bed.gz already has .gz suffix -- unchanged
Looking at the TAR_reads. bed it seems many scaffold are absent indeed. Any idea why the program skipped so many scaffolds?
Thanks!
Antoine
Hi Antoine,
I'm glad the pipeline worked better on Linux. I updated scripts/SingleCellHMM_MW.bash
so please try re-running with this new script. With over 20k scaffolds, it's overloading memory so I modified that script to close bed files that are generated. This may take a little longer to run but should work.
Best, Michael
Hi Michael,
I am trying to get the pipeline to work on UCSD supercomputer Expanse, I installed miniconda3 and all the software required except R, which I can't figure out how to install without sudo (which I can't use on Expanse); do you think the pipeline could work if I install R via conda?
Thanks!
Antoine
A conda installed R should work fine, as long as you have all the required R packages as well. Let me know if it works out.
Michael
Hi Michael,
Sorry about the stupid question, but am I supposed to activate the scTAR_cellranger environment before running snakemake? I do have R and installed all the required libraries but within the scTAR_cellranger environement. The problem is that to run snakemake I need to use "conda activate snakemake", which deactivates scTAR_cellranger... Thanks a lot!
Antoine
Hi Michael,
I get this error when I run the pipeline after installing R with conda:
wildcards: CR_REF=/expanse/lustre/scratch/adonati2/temp_project/10XMusca/mapping/Musca_ref_genome resources: tmpdir=/tmp
[Sat Oct 23 18:45:20 2021] Finished job 9. 1 of 11 steps (9%) done Select jobs to execute...
[Sat Oct 23 18:45:20 2021] rule calcHMMrefFlat: input: /expanse/lustre/scratch/adonati2/temp_project/10XMusca/mapping/M1/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/TAR_reads.bed.gz, /expanse/lustre/scratch/adon$ output: /expanse/lustre/scratch/adonati2/temp_project/10XMusca/mapping/M1/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/TAR_reads.bed.gz.withDir.genes.refFlat jobid: 7 wildcards: DATADIR=/expanse/lustre/scratch/adonati2/temp_project/10XMusca/mapping/M1, sample=Lib_10X_Musca_larva_2019_02_25_M1_1sttry resources: tmpdir=/tmp
/usr/bin/bash: line 1: Rscript: command not found [Sat Oct 23 18:45:20 2021] Error in rule calcHMMrefFlat: jobid: 7 output: /expanse/lustre/scratch/adonati2/temp_project/10XMusca/mapping/M1/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/TAR_reads.bed.gz.withDir.genes.refFlat shell:
Rscript scripts/generate_refFlat_script_both.R /expanse/lustre/scratch/adonati2/temp_project/10XMusca/mapping/Musca_ref_genome/genes/genes.refFlat /expa$
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /home/adonati2/TAR-scRNA-seq/from_cellranger/.snakemake/log/2021-10-23T184509.175761.snakemake.log
it looks like the Rscript command is not working from within the snakemake pipeline even if I do have Rscript in miniconda3/bin...
thanks!
Antoine
Hi Antoine,
It looks like Rscript is not available within the snakemake environment. You can try simply replacing Rscript
command within the rule calcHMMrefFlat
of Snakefile
to explicitly call Rscript in miniconda3/bin
.
In regards to the snakemake environment, it may be worthwhile to install snakemake via pip so you can activate the scTAR_cellranger environment without conflicting with the snakemake environment.
Best, Michael
Hi Michael,
I had a problem with R because for some reason conda installed R 3.2 I managed to install R 4.1 with mamba and then I was able to run the pipeline It ran for a full 24h on the computer before stopping before completion (max time reached): it created tones of .bed files (looks like one per scaffold) in /TAR/possorted_genome_bam_HMM_features along with "possorted_genome_bam_split.sorted.bed.gz" and "SingleCellHMM_Run.log" which reads like:
Path to SingleCellHMM.R /home/adonati2/TAR-scRNA-seq/from_cellranger/scripts INPUT_BAM /expanse/lustre/scratch/adonati2/temp_project/10XMusca/mapping/M1/Lib_10X_Musca_larva_2019_0225$ cellranger count folder /expanse/lustre/scratch/adonati2/temp_project/10XMusca/mapping/M1/Lib_10X_Musca_larva_2019_0225$ tmp folder /expanse/lustre/scratch/adonati2/temp_project/10XMusca/mapping/M1/Lib_10X_Musca_larva_2019_0225$ number Of thread 24 minimum coverage 59 thresholded at 1 in 10000000 reads
Reads spanning over splicing junction will join HMM blocks To avoid that, split reads into small blocks before input to groHMM Spliting and sorting reads... Finished splitting input bam.
I was wondering if the pipeline is really supposed to generate and keep all these .bed files? Is it normal that the pipeline took so long even running on Expanse (I ran it on just one Node, which is two 64-core AMD EPYC 7742 processors and contain 256 GB of DDR4 memory)? Thanks!
Antoine
Hi Antoine,
Yes, the pipeline is suppose to generate those bed files. The groHMM algorithm is run on individual chromosomes/scaffolds. There is a limitation with the tool if you have a very high number of scaffolds (over 20,000 in your case). Are you interested in the expression across all scaffolds? Perhaps you can filter for the major scaffolds where you'd expect the most expression.
Best, Michael
Hi Michael,
I guess it would be nice to have the TARs for all scaffold; here I have many scaffolds indeed but most of them are small, only 35 of them are over 1Mb, and the total genome size is only 750Mb; will the pipeline run slower with a more fragmented genome or is the speed only dependent on genome size?
Hi Antoine,
The pipeline would run slower with more fragmented genome. It wouldn't take as long if you ran the pipeline on, for example, the human or mouse genome.
Best, Michael
Hi Michael,
Are all these bedfiles deleted at the end of the pipeline or are they all merged together and zipped? thanks
Antoine
Hi Antoine,
These bed files are zipped, not merged, and moved to the possorted_genome_bam_HMM_features/toremove
folder which you can remove manually.
Best, Michael
Hi, I am trying to get the fromcellranger pipeline to work on MacOS Catalina 10.15.7 When I run conda env create -f scTAR_cellranger.yml (after modifying the yml file prefix to /home/miniconda3/envs/scTAR_cellranger) I get the following error message:
Collecting package metadata (repodata.json): done Solving environment: failed
ResolvePackageNotFound:
any idea about what is wrong? I tried to run snakemake without creating the scTAR_cellranger environment but I get a bunch of errors for example after doing conda activate snakemake snakemake -j I get: The flag 'directory' used in rule getMatsSteps is only valid for outputs, not inputs. Building DAG of jobs... MissingInputException in line 99 of /Users/perrylabmac/TAR-scRNA-seq/from_cellranger/Snakefile: Missing input files for rule convertToRefFlat: /Users/perrylabmac/Musca_ref_genome_Cellranger/genes/genes.gtf
thanks!
Antoine