Open JackCollora opened 2 years ago
Hi Jack! Thanks for your email.
First of all, I suggest that you work with the most updated version of the repository at https://github.com/jfnavarro/st_pipeline
Regarding your error. The "created dataset" step can be memory heavy though it should not be a problem given what I see in the log. It does not output anything to standard err? Have you tried a different UMI algorithm? Maybe you could also increase the memory limit for the job in Slurm just to make sure, otherwise I would suggest looking at "annotated.bam" and running the create dataset step manually (I can guide you) in order to determine what is going on. Although, maybe the standard err provides some useful information to debug.
Best, Jose
Hi Jose,
I'll try updating the installation to the most recent repository.
This is the standard err, my mistake labeling it as standard out: [bam_sort_core] merging from 0 files and 20 in-memory blocks... /var/spool/slurmd/job21176315/slurm_script: line 57: 71249 Segmentation fault st_pipeline_run.py --output-folder $OUTPUT --ids $ID --ref-map $MAP --ref-annotation $ANN --expName $sample --htseq-no-ambiguous --verbose --log-file $OUTPUT/${sample}_log.txt --demultiplexing-kmer 5 --threads 20 --temp-folder $TMP_ST --no-clean-up --umi-start-position 16 --umi-end-position 26 --demultiplexing-overhang 0 --min-length-qual-trimming 20 $FW $RV
Nothing is printed to standard out.
For the Slurm job we've gone up to 190Gb and received the same error. Watching the job with top did not show any usage above ~20Gb.
How can we go about trying the create a dataset manually?
Best, Jack
On Tue, Dec 14, 2021 at 11:23 AM José Fernández Navarro < @.***> wrote:
Hi Jack! Thanks for your email.
First of all, I suggest that you work with the most updated version of the repository at https://github.com/jfnavarro/st_pipeline
Regarding your error. The "created dataset" step can be memory heavy though it should not be a problem given what I see in the log. It does not output anything to standard err? Have you tried a different UMI algorithm? Maybe you could also increase the memory limit for the job in Slurm just to make sure, otherwise I would suggest looking at "annotated.bam" and running the create dataset step manually (I can guide you) in order to determine what is going on. Although, maybe the standard err provides some useful information to debug.
Best, Jose
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/SpatialTranscriptomicsResearch/st_pipeline/issues/126#issuecomment-993710860, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANZ4L67PASYJGGDFIVMFEMDUQ5VOVANCNFSM5KA2JVSA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
-- Jack Collora @.*** Graduate Student Gruber Fellow Laboratory of Ya-Chi Ho B.S. Microbiology, Immunology, and Molecular Genetics UCLA 2017
I really do not think it is related to memory, I guess it is not related to I/O? I would try first to use a different UMI algorithm and if that givesthe same results you can run the createDataset step manually by importing datasets.py from stpipeline.core.pipeline and then calling this function:
from stpipeline.common.stats import qa_stats
createDataset('path/to/annotated.bam",
qa_stats, # Passed as reference
self.ref_annotation,
self.umi_cluster_algorithm,
self.umi_allowed_mismatches,
self.umi_counting_offset,
self.disable_umi,
self.output_folder,
self.expName,
True) # Verbose
Just update the parameters accordingly.
Hello, I'm having an issue running this pipeline with a new genome index. This install works very well when aligning to human, but with a STAR index/genome/GTF file for another species it fails after logging
This is what it outputs to standard out
Thus far I've tried mapping/counting running STAR and HTSeq outside of the pipeline, and they do run without error in that context.
Here is the complete log
Any suggestions are appreciated.