bahlolab / PLASTER

Nextflow pipeline for long amplicon typing of PacBio SMRT sequencing data
MIT License
2 stars 3 forks source link

single sample with multiple amplicons without any barcode #10

Closed satheadwait closed 2 years ago

satheadwait commented 2 years ago

Hi, Thank you for developing this pipeline. I had a question about the usage as below:

I am trying to run this method for a single sample with multiple amplicons without any barcode. I kept the barcode file blank but the reads are just thrown out. Is there any workaround for this?

Thank you and regards.

jemunro commented 2 years ago

Hi, Unfortunately a barcode-free, single sample mode is not currently supported. However, I do think this would be a useful feature, so I will aim to add it in a future release.

satheadwait commented 2 years ago

Thank you. Is there a way around it in this pipeline currently as using a barcode like NNNNNNNNN for the pipeline to run?

jemunro commented 2 years ago

Unfortunately not, degenerate nucleotides are not recognised by the demultiplexer.

jemunro commented 2 years ago

Hello @satheadwait, I have created a development branch supporting single-sample mode. You can try running it by specifying the branch to nextflow with -r single-sample, with e.g.:

nextflow run bahlolab/PLASTER -r single-sample -profile preproc,singularity -c <my_dataset.config>

You will also need to set the following parameters in your config:

params {
  barcodes_fasta = null
  sample = "<sample_name>"
}
satheadwait commented 2 years ago

That's great. I will try this and let you know how it goes.

Thank you again!

satheadwait commented 2 years ago

Hi,

After running the pipeline as:

~/nextflow run bahlolab/PLASTER -r single-sample -profile preproc,singularity -c singlesample.PLASTER_TEST.config -resume

I get the following error:

Apr-28 11:51:22.660 [Task submitter] DEBUG nextflow.executor.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run Apr-28 11:51:22.683 [Task submitter] INFO nextflow.Session - [c8/9179cb] Submitted process > preproc:annotate_amplicons (SR:true) Apr-28 11:56:14.349 [Task monitor] DEBUG n.processor.TaskPollingMonitor - !! executor local > tasks to be completed: 1 -- submitted tasks are shown below ~> TaskHandler[id: 110; name: preproc:annotate_amplicons (SR:true); status: RUNNING; exit: -; error: -; workDir: /scratch/01775/saathe/PACBIO/DHCHL/PLASTER_TEST/work/c8/9179cb47190dfa081431e5f95374c0]UG n.processor.TaskPollingMonitor - !! executor local > tasks to be completed: 1 -- submitted tasks are shown below

The executor looks like: executor > local (1) [01/37b1a5] process > preproc:prep_ref:wget [100%] 1 of 1, cached: 1 ✔ [56/0fae03] process > preproc:prep_ref:mmi [100%] 1 of 1, cached: 1 ✔ [3e/d73abd] process > preproc:pb_ccs:ccs (74) [100%] 100 of 100, cache... [8b/b241f3] process > preproc:pb_ccs:merge [100%] 1 of 1, cached: 1 ✔ [e7/3c69cb] process > preproc:extract_ccs_failed [100%] 1 of 1, cached: 1 ✔ [b3/43a85d] process > preproc:pb_mm2 (SR:true) [100%] 2 of 2, cached: 2 ✔ [27/2ad9cb] process > preproc:annotate_samples:si... [100%] 2 of 2, cached: 2 ✔ [c8/9179cb] process > preproc:annotate_amplicons ... [ 50%] 1 of 2, cached: 1 [e8/9cf96f] process > preproc:pb_mm2_2 (CCS:true) [100%] 1 of 1, cached: 1 [98/aac4b3] process > preproc:split_sample_amplic... [100%] 1 of 1, cached: 1 [77/8e9a15] process > preproc:index_bam (m54331_2... [100%] 4 of 4, cached: 4 [b3/fbe37e] process > preproc:alignment_stats (CC... [100%] 1 of 1, cached: 1 [- ] process > preproc:pre_processing_repo... -

Could you help with this one? The steps before the annotate amplicon seem to have run fine with the files being created.

Thank you again for all your help.

Adwait

jemunro commented 2 years ago

Hi,

From the logs you have sent I can't see an error. Can you share the file 'trace.txt' in your run directory?

satheadwait commented 2 years ago

Hi,

Thank you for such quick responses. I am actually rerunning the process again. I don't think there is an explicit error but the annotate amplicon step seems to be running for too long. I have let it run now for around 18-20 hours over a few days and it hasn't completed. Since it is a single sample with just 4 amplicons. This seemed to be an issue. trace.txt

Thank you again. Regards. Adwait

jemunro commented 2 years ago

Hi,

I'm not sure what is going wrong, it seems like the process is hanging indefinitely for some reason.

From the run directory, can you run find -wholename ./work/c8/9179cb*/.command.log and send me the file that results to see if there is anything useful in the output for this process.

satheadwait commented 2 years ago

command.log

Please find the file attached. Thank you again.

Regards. Adwait

satheadwait commented 2 years ago

The full path was ./work/c8/9179cb47190dfa081431e5f95374c0/.command.log. Regards. Adwait

jemunro commented 2 years ago

There is no unexpected error in the log, it seems like the process starts up alright but just never finishes. I think maybe it just needs longer to finish running on your system, though 18-20 hours does seem exessive.

Can you run:

cd ./work/c8/9179cb47190dfa081431e5f95374c0
ls -alLh

and let me know the output? Just to get a rough idea how far through the process was getting based on the size of the input and output BAM files.

jemunro commented 2 years ago

Also, FYI the process that is not completing is on subreads that failed to form CCS. These reads are not use in the downstream typing component, and the process is only run for the purpose of collecting statistics used in the report.

The sample-amplicon CCS bam files should be created in ./output/bam and the manifest should be located at ./output/sample_amplicon_bam_manifest.csv - if you have those two things you can proceed to the typing stage.

satheadwait commented 2 years ago

That is very helpful to know. I have also added the manifest and bam output at the end of this message. If that is the input I can start the typing stage.

login3.stampede2(1003)$ cd ./work/c8/9179cb47190dfa081431e5f95374c0 login3.stampede2(1004)$ ls -alLh total 24G drwx------ 2 saathe G-821498 4.0K Apr 28 17:50 . drwx------ 3 saathe G-821498 4.0K Apr 28 11:51 .. -rw-r--r-- 1 saathe G-821498 838 Apr 14 11:54 amplicons.json -rw------- 1 saathe G-821498 0 Apr 28 11:51 .command.begin -rw------- 1 saathe G-821498 61 Apr 28 11:51 .command.err -rw------- 1 saathe G-821498 61 Apr 28 11:51 .command.log -rw------- 1 saathe G-821498 0 Apr 28 11:51 .command.out -rw------- 1 saathe G-821498 11K Apr 28 11:51 .command.run -rw------- 1 saathe G-821498 240 Apr 28 11:51 .command.sh -rw------- 1 saathe G-821498 0 Apr 28 11:51 .command.trace -rw------- 1 saathe G-821498 3 Apr 28 17:50 .exitcode -rw-r--r-- 1 saathe G-821498 13G Apr 23 14:53 preproc-run.SR.sm_annot.bam -rw------- 1 saathe G-821498 11G Apr 28 17:50 preproc-run.SR.true.sm_am_annot.bam

Bam files and manifest: login3.stampede2(1001)$ cat ./output/sample_amplicon_bam_manifest.csv sample,amplicon,n_reads,bam_file m54331_220119_204006,Pole_part4,672,/scratch/01775/saathe/PACBIO/DHCHL/PLASTER_TEST/output/bam/LB-preproc-run.SM-m54331_220119_204006.AM-Pole_part4.bam m54331_220119_204006,Pole_part2,6481,/scratch/01775/saathe/PACBIO/DHCHL/PLASTER_TEST/output/bam/LB-preproc-run.SM-m54331_220119_204006.AM-Pole_part2.bam m54331_220119_204006,Pole_part1,1244,/scratch/01775/saathe/PACBIO/DHCHL/PLASTER_TEST/output/bam/LB-preproc-run.SM-m54331_220119_204006.AM-Pole_part1.bam m54331_220119_204006,Pole_part3,2605,/scratch/01775/saathe/PACBIO/DHCHL/PLASTER_TEST/output/bam/LB-preproc-run.SM-m54331_220119_204006.AM-Pole_part3.bam login3.stampede2(1002)$ ll ./output/bam total 98548 -rw-r--r-- 1 saathe G-821498 18409908 Apr 22 17:14 LB-preproc-run.SM-m54331_220119_204006.AM-Pole_part1.bam -rw-r--r-- 1 saathe G-821498 57432 Apr 22 17:14 LB-preproc-run.SM-m54331_220119_204006.AM-Pole_part1.bam.bai -rw-r--r-- 1 saathe G-821498 53586750 Apr 22 17:14 LB-preproc-run.SM-m54331_220119_204006.AM-Pole_part2.bam -rw-r--r-- 1 saathe G-821498 65736 Apr 22 17:14 LB-preproc-run.SM-m54331_220119_204006.AM-Pole_part2.bam.bai -rw-r--r-- 1 saathe G-821498 19220213 Apr 22 17:14 LB-preproc-run.SM-m54331_220119_204006.AM-Pole_part3.bam -rw-r--r-- 1 saathe G-821498 54112 Apr 22 17:14 LB-preproc-run.SM-m54331_220119_204006.AM-Pole_part3.bam.bai -rw-r--r-- 1 saathe G-821498 9442322 Apr 22 17:14 LB-preproc-run.SM-m54331_220119_204006.AM-Pole_part4.bam -rw-r--r-- 1 saathe G-821498 55224 Apr 22 17:14 LB-preproc-run.SM-m54331_220119_204006.AM-Pole_part4.bam.bai

jemunro commented 2 years ago

Hi Adwait,

With those outputs you can start on the typing stage.

Looking at the files in your work directory, it seems like the process was close to finishing, with 11 GB output written from 13 GB input after 6 hours (based on the timestamp of .command.begin and .exitcode).

jemunro commented 2 years ago

Single sample mode added in version 22.05.01