DIncalciLab / samurai

A bioinformatics best-practice analysis pipeline for the analysis of shallow whole genome sequencing (sWGS) data for the identification of copy number alterations (CNAs).
MIT License
3 stars 0 forks source link

WisecondorX caller should require a PoN #25

Closed pblpez closed 3 weeks ago

pblpez commented 3 weeks ago

Description of the bug

As mentioned in https://github.com/DIncalciLab/samurai/issues/24#issuecomment-2426194676 I was trying to run samurai setting --caller wisecondorx, but I got the error ERROR ~ Argument of file function cannot be null, which is not too explicit. We found out that the error was due to the missing PoN, which should be provided or computed.

Apart from this I wanted to focus the analysis to chromosomes 1:22, as it can be done with ichorCNA, but in the nextflow.config I can only find the following parameters:

// WisecondorX

wisecondorx_no_rm_dup                  = false
wisecondorx_yfrac                      = 0.4
wisecondorx_ylim                       = null
wisecondorx_zscore                     = 5
wisecondorx_blacklist                  = null

Is it possible to do so? Or using wisecondorx as caller necessarily needs sex chromosomes to take part in the analysis?

I will be out for a week, but will take a look to any update once I come back. Thanks again!

Command used and terminal output

No response

Relevant files

No response

System information

No response

lbeltrame commented 3 weeks ago

Thanks, this looks straightforward enough. I just need to find some time for it. With regards to WisecondorX, I found no way to restrict analysis just to autosomes. I wonder if subsetting the BAM to exclude the sex chromosomes would work (no idea: I didn't test).

lbeltrame commented 3 weeks ago

I should have pushed a fix (but also get the commit after it, I accidentally introduced a typo). When you can test again, let me know if it works.

pblpez commented 2 weeks ago

Hi, sorry for the delay. I rerun samurai with --caller wisecondorx --build_pon true and it finished with errors:

executor > local (105) [43/b6311a] process > DINCALCILAB_SAMURAI:SAMURAI:SAMTOOLS_INDEX (004_0256_0155) [100%] 1 of 1 ✔ [8e/ddbe69] process > DINCALCILAB_SAMURAI:SAMURAI:BAM_QC_PICARD:PICARD_COLLECTMULTIPLEMETRICS (004_0256_0155) [100%] 1 of 1 ✔ [- ] process > DINCALCILAB_SAMURAI:SAMURAI:BAM_QC_PICARD:PICARD_COLLECTHSMETRICS - [9d/3f5b64] process > DINCALCILAB_SAMURAI:SAMURAI:BAM_QC_PICARD:PICARD_COLLECTWGSMETRICS (004_0256_0155) [100%] 1 of 1, failed: 1 ✘ [99/d966ee] process > DINCALCILAB_SAMURAI:SAMURAI:LIQUID_BIOPSY:BUILD_PON:NORMAL_CONVERT (004_0256_0187_recalibrated) [100%] 96 of 96 ✔ [58/c0ed99] process > DINCALCILAB_SAMURAI:SAMURAI:LIQUID_BIOPSY:BUILD_PON:WISECONDORX_NEWREF (reference) [100%] 1 of 1 ✔ [ae/8bb98c] process > DINCALCILAB_SAMURAI:SAMURAI:LIQUID_BIOPSY:WISECONDORX_CONVERT (004_0256_0155) [100%] 1 of 1 ✔ [e5/67c06e] process > DINCALCILAB_SAMURAI:SAMURAI:LIQUID_BIOPSY:WISECONDORX_PREDICT (004_0256_0155) [100%] 1 of 1 ✔ [9e/e4c9f6] process > DINCALCILAB_SAMURAI:SAMURAI:LIQUID_BIOPSY:CONVERT_GISTIC_SEG (004_0256_0155) [100%] 1 of 1, failed: 1 ✘ [d2/49a691] process > DINCALCILAB_SAMURAI:SAMURAI:LIQUID_BIOPSY:ASSEMBLE_WISECONDORX_OUTPUTS (output tables) [100%] 1 of 1 ✔ [9b/1d24e0] process > DINCALCILAB_SAMURAI:SAMURAI:LIQUID_BIOPSY:CONVERT_WISECONDORX_IMAGES (genome_plot) [100%] 1 of 1, failed: 1 ✘ [- ] process > DINCALCILAB_SAMURAI:SAMURAI:MULTIQC - -[dincalcilab/samurai] Pipeline completed with errors- ERROR ~ Error executing process > 'DINCALCILAB_SAMURAI:SAMURAI:LIQUID_BIOPSY:CONVERT_WISECONDORX_IMAGES (genome_plot)'

Caused by: Process DINCALCILAB_SAMURAI:SAMURAI:LIQUID_BIOPSY:CONVERT_WISECONDORX_IMAGES (genome_plot) terminated with an error exit status (2)

Command executed:

img2pdf 004_0256_0155_genome_plot.png \ --title "WisecondorX results" \ -o genome_plots.pdf

cat <<-END_VERSIONS > versions.yml "DINCALCILAB_SAMURAI:SAMURAI:LIQUID_BIOPSY:CONVERT_WISECONDORX_IMAGES": img2pdf: $(img2pdf --version | cut -d' ' -f2) END_VERSIONS

Command exit status: 2

Command output: (empty)

Command error: Unable to find image 'quay.io/einar_rainhart/img2pdf:latest' locally latest: Pulling from einar_rainhart/img2pdf 1671565cc8df: Pulling fs layer 81a8c22d25d4: Pulling fs layer 1671565cc8df: Waiting 1671565cc8df: Verifying Checksum 1671565cc8df: Download complete 81a8c22d25d4: Verifying Checksum 81a8c22d25d4: Download complete 1671565cc8df: Pull complete 81a8c22d25d4: Pull complete Digest: sha256:6309743e43eb78054c5fa73bc5cc96cc63fa2e3c0191ef525262530ebec6f40e Status: Downloaded newer image for quay.io/einar_rainhart/img2pdf:latest usage: img2pdf [-h] [-v] [-V] [--gui] [-o out] [-C colorspace] [-D] [--engine engine] [--first-frame-only] [--pillow-limit-break] [--pdfa [PDFA]] [-S LxL] [-s LxL] [-b L[:L]] [-f FIT] [-a] [--crop-border L[:L]] [--bleed-border L[:L]] [--trim-border L[:L]] [--art-border L[:L]] [--title title] [--author author] [--creator creator] [--producer producer] [--creationdate creationdate] [--moddate moddate] [--subject subject] [--keywords kw [kw ...]] [--viewer-panes PANES] [--viewer-initial-page NUM] [--viewer-magnification MAG] [--viewer-page-layout LAYOUT] [--viewer-fit-window] [--viewer-center-window] [--viewer-fullscreen] [infile ...] img2pdf: error: unrecognized arguments: -c eval export PYTHONNOUSERSITE="1" export R_PROFILE_USER="/.Rprofile" export R_ENVIRON_USER="/.Renviron" export JULIA_DEPOT_PATH="/usr/local/share/julia" export PATH="$PATH:/home/pperez/samurai/bin"; /bin/bash .command.run nxf_trace

Work dir: /home/pperez/work/9b/1d24e03978fb1594577acc8da0a523

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

-- Check '.nextflow.log' file for details

It looks like it generated most of outputs though (bins.bed, genome_plot.bed, segments.bed, statistics.txt), although it didn't find any aberration (but I don't think this is due to any error).

nextflow.log

Thanks!

lbeltrame commented 1 week ago

This is weird. Can you provide (anonymize or remove all details you deem necessary) the .command.sh file that's in the work dir mentioned by Nextflow?

lbeltrame commented 1 day ago

For the record, me and Sara have tried to reproduce this without success. That is why having that .command.sh may show what's wrong.

pageale commented 5 hours ago

Hello ! I was following this issue and also got an error with DINCALCILAB_SAMURAI:SAMURAI:BAM_QC_PICARD:PICARD_COLLECTWGSMETRICS. All (aligment, creation of PoN for wisecondorx), etc runs smothly until this point.

Command used and terminal output

nextflow run dincalcilab/samurai     --input test4samples.csv    \
 -profile docker     --outdir test_docker_samurai     \
--genome GRCh38     \
--analysis_type liquid_biopsy    \
--caller wisecondorx     --fasta ~/Reference/bwa_GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa     \
--fai ~/Reference/bwa_GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa.fai     \
--dict ~/Reference/bwa_GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa.dict    \
 --aligner bwamem     --build_pon TRUE     \
--pon_path /home/healthy/bam \
--plot_fragment_distribution --max_memory 28.GB

Terminal output image image

0:00:21s.  Time for last 10,000,000:    0s.  Last read position: chr2:129,993,342
  [Wed Nov 13 15:52:45 GMT 2024] picard.analysis.CollectWgsMetrics done. Elapsed time: 0.36 minutes.
  Runtime.totalMemory()=901775360
  To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
  Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index 133797422 out of bounds for length 133797422
    at picard.analysis.AbstractWgsMetricsCollector.isReferenceBaseN(AbstractWgsMetricsCollector.java:218)
    at picard.analysis.WgsMetricsProcessorImpl.processFile(WgsMetricsProcessorImpl.java:92)
    at picard.analysis.CollectWgsMetrics.doWork(CollectWgsMetrics.java:242)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:280)
    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:105)
    at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:115)

Work dir:
  /home/alessandra/Projects/rawdata/work/1f/d4f9849d8e29d65236a2c2bce1bc09

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

 -- Check '.nextflow.log' file for details

These are the file mentioned by nextflow: nextflow_13nov.log

command_13nov.txt

Also, a doubt that I have, is it possible to run directly putting as input bam files? If so, how should be the format? Thank you for your help and this pipeline !!