SBIMB / StellarPGx

Calling star alleles in highly polymorphic pharmacogenes (e.g. CYP450 genes) by leveraging genome graph-based variant detection.
MIT License
30 stars 7 forks source link

No result when using cram files #28

Closed tija94 closed 1 year ago

tija94 commented 1 year ago

Hi,

I wanted to test StellarPGx with cram files. However, I do not get a result and I do not see any error. The only output I see is this

N E X T F L O W  ~  version 22.10.0
Launching `main.nf` [jovial_plateau] DSL1 - revision: 89538b1502
[-        ] process > call_snvs1   -
[-        ] process > call_snvs2   -
[-        ] process > call_sv_del  -
[-        ] process > call_sv_dup  -
[-        ] process > get_depth    -
[-        ] process > format_snvs  -
[-        ] process > get_core_var -
[-        ] process > analyse_1    -
[-        ] process > analyse_2    -
[-        ] process > analyse_3    -
[-        ] process > call_stars   -

There is no additional logging (at least none that I am aware of) because the work directory is empty. I am running it like this nextflow run main.nf -profile standard --build hg38 --gene cyp2d6 with the following netflow.config:


manifest {
    author = 'David Twesigomwe'
    description = 'Pipeline calls CYP450 star alleles from WGS BAM/CRAM files. Model gene: CYP2D6'
    mainScript = 'main.nf'
    version = '1.2.6'
}

params {
// User-defined parameters

   // reference genome
   ref_file = "$PWD/hg38.fa"  // .fai index should be in the same folder

   // BAM/CRAM file(s) and respective indexes 
       // example1 (single sample): /path/to/data/GeT-RM/NA12878*{bam,bai}
       // example2 (multiple samples): /path/to/data/GeT-RM/*{bam,bai}
       // example3 (CRAM files): /path/to/data/GeT-RM/HG*{cram,crai}

   in_bam = "$PWD/data/*{cram,crai}" 

   // Output directoy (Default is $PWD/results). User-defined path should be an absolute path
   out_dir = "$PWD/results"

   // DO NOT modify these lines  
   gene = "cyp2d6"
   db_init = "$PWD/database"
   res_init = "$PWD/resources"
   caller_init = "$PWD/scripts"

}

singularity {
    enabled = false // set to false when using Docker
    autoMounts = true
    cacheDir = "$PWD/containers"
    runOptions = " --cleanenv"
}

// To use Docker, set "enabled" to "true" and do the opposite to the singularity config above. Also remember to change the container path in the process config below to point to the Docker container rather than Singularity. 

docker {
    enabled = true // change to true when using Docker
    runOptions = '-u \$(id -u):\$(id -g)'
}

process {

    // ALL PROCESSES
    cache = true
    stageInMode = 'symlink'
    stageOutMode = 'rsync'
   // scratch = "$HOME/tmp"  // clean this regularly

    // Containers

    // Singularity
    // container = "$PWD/containers/stellarpgx-dev.sif"

    // Docker
    container = "twesigomwedavid/stellarpgx-dev:latest" // Note that this Docker build needs to be pulled from Docker Hub

}

profiles {

    // Local machine (MacOS, Linux, cluster node etc)
    standard { 
        process.executor = 'local'
    }

    // SLURM scheduler
    slurm { 
        process.executor = 'slurm'
        process.queue = 'batch'
    } 

    // Other scheduler
    // sheduler_name {
    //  process.executor = 'sheduler_name'
    //  process.queue = 'batch'
    //}

    test { includeConfig "$PWD/tests/config/test.config" }

}

In my data directory I have cram, cram.crai, bam, and bam.bai files. When I am using the bam file of the same sample and changing in_bam = "$PWD/data/*{cram,crai}" to in_bam = "$PWD/data/*{bam,bai}" it works.

Am I doing something wrong? Are there other setting changes necessary? I am not familiar with nextflow, so if there is a possibility to see some logging elsewhere, please let me know. Thanks!


I do not know if this makes any difference but I am using WES data (not WGS) with a custom cyp2d6 test3.bed file:

chr22   42126499    42126788
chr22   42127841    42130273
chr22   42126392    42131927
chr10   94941837    94942406
chr22   42139000    42142455
chr22   42142465    42144575
chr22   42139500    42140600
chr22   42140600    42142500
chr19   40879288    40879625
chr22   42140600    42143600
chr22   42143600    42144575
twesigomwedavid commented 1 year ago

Hi @tija94, For cram files the command is nextflow run main.nf -profile standard --build hg38 --gene cyp2d6 --format compressed

Kindly note that we haven't completed optimisation of StellarPGx for WES yet. Therefore, the results you get might be highly unreliable at the moment, especially for CYP2D6.

tija94 commented 1 year ago

Thank you! Supplying --format compressed did solve it.

Mahmoudbassuoni commented 3 months ago

Hi, I am having the same issue. While I did add the --format compressed flag. What could be wrong?

twesigomwedavid commented 3 months ago

@Mahmoudbassuoni,

Could you share the line in your config file where you're supplying the --in_bam? Perhaps it's not pointing to the files correctly.

Mahmoudbassuoni commented 3 months ago
(base) mbassyouni@svmrchor-gns02:/Data/dataflash/StellarPGx$ ls data/*{cram,crai}
data/S1.recal.cram  data/S1.recal.cram.crai  data/S2.recal.cram  data/S2.recal.cram.crai  data/S3.recal.cram  data/S3.recal.cram.crai

while the config file is like this

params {
// User-defined parameters

   // reference genome
   ref_file = "$PWD/Homo_sapiens_assembly38.fasta"  // .fai index should be in the same folder

   // BAM/CRAM file(s) and respective indexes
       // example1 (single sample): /path/to/data/GeT-RM/NA12878*{bam,bai}
       // example2 (multiple samples): /path/to/data/GeT-RM/*{bam,bai}
       // example3 (CRAM files): /path/to/data/GeT-RM/HG*{cram,crai}

   in_bam = "$PWD/data/*{cram,crai}"

   // Output directoy (Default is $PWD/results). User-defined path should be an absolute path
   out_dir = "$PWD/results"
twesigomwedavid commented 3 months ago

Are the cram files in data symbolic links?

Mahmoudbassuoni commented 3 months ago

yes, I have tried to point to the main directory that they in before the symbolic links but it didn't work either.

twesigomwedavid commented 3 months ago

On some servers, it's possible that you can only access the files only if you're logged into particular node(s). It's also possible that your permissions to access the files have changed.

When you do ls data/ are the symbolic links coloured sky-blue-ish or are they coloured red-on-dark-background?

Mahmoudbassuoni commented 3 months ago

red on dark background yes

twesigomwedavid commented 3 months ago

Exactly what I thought.

Red on dark background means you don't have access to those samples for one reason or the other. It's not an issue with StellarPGx.

It could be that you need to be logged into a worker node (or particular nodes on your system) to access the samples (can try launching StellarPGx when logged into a worker node) or it could be an issue with your file permissions. In some cases, the data may have been moved (you can try ls -l to confirm that the symbolic links point to the correct paths). Anyway, all this means you need to contact your Sys Admin

Mahmoudbassuoni commented 3 months ago

I have tried it again while pointing to the main path without the symbolic link and it worked fine. I think the first time I ran it without the --format flag. Anyway, thank you for your response.

twesigomwedavid commented 3 months ago

No worries