CFSAN-Biostatistics / snp-pipeline

SNP Pipeline is a pipeline for the production of SNP matrices from sequence data used in the phylogenetic analysis of pathogenic organisms sequenced from samples of interest to food safety.
Other
57 stars 31 forks source link

Pipeline gets stuck after read mapping. #22

Closed dzd5 closed 1 year ago

dzd5 commented 3 years ago

I have run the pipeline successfully on various isolate collections, but on the current collection the pipeline will not finish and gets stuck at "cfsan_snp_pipeline map_reads finished"

The output files have not been generated so I know the pipeline is incomplete. Has anyone else had this error or know how I can troubleshoot this?

dzd5 commented 3 years ago

Contents of the error.log file after I ended the process:

Traceback (most recent call last): File "/programs/snp-pipeline-2.2.1/bin/cfsan_snp_pipeline", line 8, in sys.exit(main()) File "/programs/snp-pipeline-2.2.1/lib/python3.6/site-packages/snppipeline/cfsan_snp_pipeline.py", line 645, in main return run_command_from_arg_list(sys.argv[1:]) File "/programs/snp-pipeline-2.2.1/lib/python3.6/site-packages/snppipeline/cfsan_snp_pipeline.py", line 606, in run_command_from_arg_list return run_command_from_args(args) File "/programs/snp-pipeline-2.2.1/lib/python3.6/site-packages/snppipeline/cfsan_snp_pipeline.py", line 585, in run_command_from_args args.func(args) # this executes the function previously associated with the subparser with set_defaults File "/programs/snp-pipeline-2.2.1/lib/python3.6/site-packages/snppipeline/run.py", line 662, in run job_id_map_reads = runner.run_array(command_line, "mapReads", log_file, sample_full_path_names_file, max_processes=max_processes, wait_for=[job_id_index_ref]$ File "/programs/snp-pipeline-2.2.1/lib/python3.6/site-packages/jobrunner/jobrunner.py", line 469, in run_array subprocess.check_call(command_line, shell=True, executable="bash") # If the return code is non-zero it raises a CalledProcessError File "/usr/lib64/python3.6/subprocess.py", line 286, in check_call retcode = call(*popenargs, **kwargs) File "/usr/lib64/python3.6/subprocess.py", line 269, in call return p.wait(timeout=timeout) File "/usr/lib64/python3.6/subprocess.py", line 1457, in wait (pid, sts) = self._try_wait(0) File "/usr/lib64/python3.6/subprocess.py", line 1404, in _try_wait (pid, sts) = os.waitpid(self.pid, wait_flags) KeyboardInterrupt

Shutting down the SNP Pipeline.

Error detected while running cfsan_snp_pipeline map_reads.

dzd5 commented 3 years ago

My output has mapReads.log files for my entire isolate collection and none of them have errors. This leads me to believe that the pipeline is for some reason getting stuck between read mapping and the next step. When I run the pipeline for just a couple isolates from my collection it is able to finish, but when I run it on all 46 isolates it again gets stuck after read mapping.

hughrandFDA commented 3 years ago

Hi there,

No obvious cause of the problem. Couple of questions:

  1. How closely related are the isolates you are running?
  2. How long does it take when you are running just a couple of isolates?
  3. How long do you wait with all 46 isolates before killing the job? (Am I correct that you have to kill it, or does it die?)
  4. What kind of a system are you running this on?

Thanks,

Hugh Rand

From: dzd5 @.> Sent: Thursday, August 26, 2021 5:02 PM To: CFSAN-Biostatistics/snp-pipeline @.> Cc: Subscribed @.***> Subject: [EXTERNAL] Re: [CFSAN-Biostatistics/snp-pipeline] Pipeline Not Finishing (#22)

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

My output has mapReads.log files for my entire isolate collection and none of them have errors. This leads me to believe that the pipeline is for some reason getting stuck between read mapping and the next step. When I run the pipeline for just a couple isolates from my collection it is able to finish, but when I run it on all 46 isolates it again gets stuck after read mapping.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/CFSAN-Biostatistics/snp-pipeline/issues/22#issuecomment-906739696, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB2BAGYHNWK42KE5CS5QVNTT62TTNANCNFSM5C3U4O5A. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

dzd5 commented 3 years ago

Hello I am running into this same problem again with a new set of isolates.

For the previous set, I just kept re-trying the pipeline until it didn't get stuck and it eventually worked. For the current set of isolates I am working on (22 isolates) it worked fine and completed in less than 12 hours. However I was required to re-do the exact same analysis and now it is getting stuck at the same point again "cfsan_snp_pipeline map_reads finished"

1) Some are closely related (0-1 SNPs) and others are distant(+1000 SNPs) 2) I think less than an hour 3) 24 hours and then I manually kill it since there is no progression 4) I am running the pipeline on a unix system from a cloud computing service.

dzd5 commented 2 years ago

I'm fairly certain the inclusion of isolates too genetically distant from each other was the reason for the pipeline stalling. Only including closely related isolates has fixed the issue.