J35P312 / FindSV

Structural variant pipeline
17 stars 5 forks source link

Unknown method `splitCsv` on String type -- Did you mean? #4

Open xexpanderx opened 5 years ago

xexpanderx commented 5 years ago

Running the tool like this;

python2 FindSV.py --bam /projects/wp3/nobackup/WGS/testCNV/RL-2047-NA12878.clean.dedup.bam --output /beegfs/wp3/WGS/CNV/2/ --config ~/git/FindSV/FindSV.conf

Im getting this:

Processing, please do not turn off FindSV
mkdir: cannot create directory ‘/beegfs/wp3/WGS/CNV/2/’: File exists
SAMPLE_ID:/projects/wp3/nobackup/WGS/testCNV/RL-2047-NA12878.clean.dedup.bam
N E X T F L O W  ~  version 19.10.0
Launching `/home/andax639/git/FindSV/FindSV_core.nf` [stoic_fermi] - revision: a648677aaf
Unknown method `splitCsv` on String type -- Did you mean?
  split

 -- Check script 'FindSV_core.nf' at line: 18 or see '.nextflow.log' file for more details
DONE
the .nextflow.log looks like this:
Nov-14 12:10:32.025 [main] DEBUG nextflow.cli.Launcher - $> nextflow /home/andax639/git/FindSV/FindSV_core.nf --$
Nov-14 12:10:32.190 [main] INFO  nextflow.cli.CmdRun - N E X T F L O W  ~  version 19.10.0
Nov-14 12:10:32.206 [main] INFO  nextflow.cli.CmdRun - Launching `/home/andax639/git/FindSV/FindSV_core.nf` [sto$
Nov-14 12:10:32.239 [main] DEBUG nextflow.config.ConfigBuilder - User config file: /home/andax639/git/FindSV/Fin$
Nov-14 12:10:32.239 [main] DEBUG nextflow.config.ConfigBuilder - Parsing config file: /home/andax639/git/FindSV/$
Nov-14 12:10:32.272 [main] DEBUG nextflow.config.ConfigBuilder - Applying config profile: `standard`
Nov-14 12:10:33.130 [main] DEBUG nextflow.extension.OperatorEx - Dataflow extension methods: branch,buffer,chain$
Nov-14 12:10:33.137 [main] DEBUG nextflow.Session - Session uuid: a01bf08a-d7d1-46d4-b80c-88697d919617
Nov-14 12:10:33.138 [main] DEBUG nextflow.Session - Run name: stoic_fermi
Nov-14 12:10:33.138 [main] DEBUG nextflow.Session - Executor pool size: 32
Nov-14 12:10:33.154 [main] DEBUG nextflow.cli.CmdRun -
  Version: 19.10.0 build 5170
  Created: 21-10-2019 15:07 UTC (17:07 CEST)
  System: Linux 3.10.0-1062.4.1.el7.x86_64
  Runtime: Groovy 2.5.8 on Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12
  Encoding: UTF-8 (UTF-8)
  Process: 25816@compute09 [192.168.3.19]
  CPUs: 32 - Mem: 125.4 GB (45.2 GB) - Swap: 4 GB (4 GB)
Nov-14 12:10:33.193 [main] DEBUG nextflow.Session - Work-dir: /home/andax639/git/FindSV/work [nfs]
Nov-14 12:10:33.193 [main] DEBUG nextflow.Session - Script base path does not exist or is not a directory: /home$
Nov-14 12:10:33.214 [main] DEBUG nextflow.Session - Observer factory: TowerFactory
Nov-14 12:10:33.216 [main] DEBUG nextflow.Session - Observer factory: DefaultObserverFactory
Nov-14 12:10:33.744 [main] DEBUG nextflow.Session - Session start invoked
Nov-14 12:10:33.749 [main] DEBUG nextflow.trace.TraceFileObserver - Flow starting -- trace file: /beegfs/wp3/WGS$
Nov-14 12:10:34.383 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution
Nov-14 12:10:34.401 [main] DEBUG nextflow.Session - Session aborted -- Cause: No signature of method: java.lang.$
Possible solutions: split(), split(java.lang.String), split(groovy.lang.Closure), split(java.lang.String, int)
Nov-14 12:10:34.415 [main] ERROR nextflow.cli.Launcher - @unknown
groovy.lang.MissingMethodException: No signature of method: java.lang.String.splitCsv() is applicable for argume$
Possible solutions: split(), split(java.lang.String), split(groovy.lang.Closure), split(java.lang.String, int)
        at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.unwrap(ScriptBytecodeAdapter.java:70)
        at org.codehaus.groovy.runtime.callsite.PojoMetaClassSite.call(PojoMetaClassSite.java:46)
        at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:115)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:119)
        at Script_2edd5fdb.runScript(Script_2edd5fdb:18)
        at nextflow.script.BaseScript.run0(BaseScript.groovy:152)
        at nextflow.script.BaseScript.run(BaseScript.groovy:182)
        at nextflow.script.ScriptParser.runScript(ScriptParser.groovy:217)
        at nextflow.script.ScriptRunner.run(ScriptRunner.groovy:218)
        at nextflow.script.ScriptRunner.execute(ScriptRunner.groovy:126)
        at nextflow.cli.CmdRun.run(CmdRun.groovy:257)
        at nextflow.cli.Launcher.run(Launcher.groovy:457)
        at nextflow.cli.Launcher.main(Launcher.groovy:639)

Any ideas? I see that FindSV is build upon python2 which is more or less deprecated. Could this be easily updated to python3?

J35P312 commented 5 years ago

This looks like a nextflow error, I have downloaded latest nextflow, and I get the same issue:

nextflow_19_10

We use nextflow 18.10 on our server, hence i have not noticed this error until now. the splitCsv function is still present in the nextflow documentation, so it seems like some sort of bug in the latest nextflow version (else the documentation is outdated).

https://www.nextflow.io/docs/latest/operator.html?highlight=split#splitcsv

You can therefore resolve this issue either by installing an older version of nextflow (I recomened nextflow-18.10.1). Or you can download a copy of the older nextflow:

wget https://github.com/nextflow-io/nextflow/releases/download/v18.10.1/nextflow-18.10.1-all

and run FindSV like this:

chmod +x nextflow-18.10.1-all ./nextflow-18.10.1-all FindSV_core.nf -c FindSV.conf --bam 12386.bam --working-dir test

you may also edit the launch_core.sh script: nano launch_core.sh

change the last line of the script: nextflow $FindSV_dir/FindSV_core.nf --bam $1 -c $2 --working_dir $3 -with-trace $3/trace.txt | tee $3/log.txt

so that you are calling the older version of nextflow:

/home/jesperei/sens2017130/FindSV_latest/FindSV/nextflow-18.10.1-all $FindSV_dir/FindSV_core.nf --bam $1 -c $2 --working_dir $3 -with-trace $3/trace.txt | tee $3/log.txt

now you can run FindSV through the python wrapper. Good luck, and sorry for the troubles!

xexpanderx commented 5 years ago

Thank you! I come a bit longer now, but some CALLING error were spitted out:

python2 FindSV.py --bam /projects/wp3/nobackup/WGS/testCNV/RL-2047-NA12878.clean.dedup.bam --output /beegfs/wp3/WGS/CNV/ --config ~/git/FindSV/FindSV.conf
Processing, please do not turn off FindSV
mkdir: cannot create directory ‘/beegfs/wp3/WGS/CNV/’: File exists
SAMPLE_ID:/projects/wp3/nobackup/WGS/testCNV/RL-2047-NA12878.clean.dedup.bam
N E X T F L O W  ~  version 18.10.1
Launching `/home/andax639/git/FindSV/FindSV_core.nf` [sharp_wright] - revision: a648677aaf
[warm up] executor > local
[cf/16dae6] Submitted process > TIDDIT (RL-2047-NA12878.clean.dedup.bam)
[69/22846f] Submitted process > CNVnator (RL-2047-NA12878.clean.dedup.bam)
[cf/16dae6] NOTE: Missing output file(s) `RL-2047-NA12878.clean.dedup.vcf` expected by process `TIDDIT (RL-2047-NA12878.clean.dedup.bam)` -- Error is ignored
RL-2047-NA12878.clean.dedup
FAILED:CALLING
DONE

Thank you for helping me here! Tell me if you need more log files.

xexpanderx commented 5 years ago

.nextflog.log:

cat .nextflow.log
Nov-14 18:27:55.069 [main] DEBUG nextflow.cli.Launcher - $> /home/andax639/wget/nextflow-18.10.1-all /home/andax639/git/FindSV/FindSV_core.nf --bam /projects/wp3/nobackup/WGS/testCNV/RL-2047-NA12878.clean.dedup.bam -c /home/andax639/git/FindSV/FindSV.conf --working_dir /beegfs/wp3/WGS/CNV/ -with-trace /beegfs/wp3/WGS/CNV//trace.txt
Nov-14 18:27:55.202 [main] INFO  nextflow.cli.CmdRun - N E X T F L O W  ~  version 18.10.1
Nov-14 18:27:55.215 [main] INFO  nextflow.cli.CmdRun - Launching `/home/andax639/git/FindSV/FindSV_core.nf` [sharp_wright] - revision: a648677aaf
Nov-14 18:27:55.232 [main] DEBUG nextflow.config.ConfigBuilder - User config file: /home/andax639/git/FindSV/FindSV.conf
Nov-14 18:27:55.232 [main] DEBUG nextflow.config.ConfigBuilder - Parsing config file: /home/andax639/git/FindSV/FindSV.conf
Nov-14 18:27:55.259 [main] DEBUG nextflow.config.ConfigBuilder - Applying config profile: `standard`
Nov-14 18:27:55.859 [main] DEBUG nextflow.Session - Session uuid: 72f88ed8-d284-47ea-a155-8ec1b3fc6cf3
Nov-14 18:27:55.859 [main] DEBUG nextflow.Session - Run name: sharp_wright
Nov-14 18:27:55.860 [main] DEBUG nextflow.Session - Executor pool size: 32
Nov-14 18:27:55.873 [main] DEBUG nextflow.cli.CmdRun - 
  Version: 18.10.1 build 5003
  Modified: 24-10-2018 14:03 UTC (16:03 CEST)
  System: Linux 3.10.0-1062.4.1.el7.x86_64
  Runtime: Groovy 2.5.3 on Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12
  Encoding: UTF-8 (UTF-8)
  Process: 30465@compute09 [192.168.3.19]
  CPUs: 32 - Mem: 125.4 GB (45.3 GB) - Swap: 4 GB (4 GB)
Nov-14 18:27:55.912 [main] DEBUG nextflow.Session - Work-dir: /home/andax639/git/FindSV/work [nfs]
Nov-14 18:27:55.913 [main] DEBUG nextflow.Session - Script base path does not exist or is not a directory: /home/andax639/git/FindSV/bin
Nov-14 18:27:56.041 [main] DEBUG nextflow.Session - Session start invoked
Nov-14 18:27:56.045 [main] DEBUG nextflow.processor.TaskDispatcher - Dispatcher > start
Nov-14 18:27:56.046 [main] DEBUG nextflow.trace.TraceFileObserver - Flow starting -- trace file: /beegfs/wp3/WGS/CNV/trace.txt
Nov-14 18:27:56.052 [main] DEBUG nextflow.script.ScriptRunner - > Script parsing
Nov-14 18:27:56.737 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution
Nov-14 18:27:56.925 [main] DEBUG nextflow.processor.ProcessFactory - Discovered executor class: nextflow.executor.IgExecutor
Nov-14 18:27:57.083 [main] DEBUG nextflow.processor.ProcessFactory - << taskConfig executor: local
Nov-14 18:27:57.083 [main] DEBUG nextflow.processor.ProcessFactory - >> processorType: 'local'
Nov-14 18:27:57.087 [main] DEBUG nextflow.executor.Executor - Initializing executor: local
Nov-14 18:27:57.089 [main] INFO  nextflow.executor.Executor - [warm up] executor > local
Nov-14 18:27:57.092 [main] DEBUG n.processor.LocalPollingMonitor - Creating local task monitor for executor 'local' > cpus=32; memory=125.4 GB; capacity=32; pollInterval=100ms; dumpInterval=5m
Nov-14 18:27:57.096 [main] DEBUG nextflow.processor.TaskDispatcher - Starting monitor: LocalPollingMonitor
Nov-14 18:27:57.096 [main] DEBUG n.processor.TaskPollingMonitor - >>> barrier register (monitor: local)
Nov-14 18:27:57.098 [main] DEBUG nextflow.executor.Executor - Invoke register for executor: local
Nov-14 18:27:57.124 [main] DEBUG nextflow.Session - >>> barrier register (process: TIDDIT)
Nov-14 18:27:57.127 [main] DEBUG nextflow.processor.TaskProcessor - Creating operator > TIDDIT -- maxForks: 32
Nov-14 18:27:57.148 [main] DEBUG nextflow.processor.ProcessFactory - << taskConfig executor: local
Nov-14 18:27:57.148 [main] DEBUG nextflow.processor.ProcessFactory - >> processorType: 'local'
Nov-14 18:27:57.149 [main] DEBUG nextflow.executor.Executor - Initializing executor: local
Nov-14 18:27:57.149 [main] DEBUG nextflow.Session - >>> barrier register (process: CNVnator)
Nov-14 18:27:57.149 [main] DEBUG nextflow.processor.TaskProcessor - Creating operator > CNVnator -- maxForks: 32
Nov-14 18:27:57.162 [main] DEBUG nextflow.processor.ProcessFactory - << taskConfig executor: local
Nov-14 18:27:57.162 [main] DEBUG nextflow.processor.ProcessFactory - >> processorType: 'local'
Nov-14 18:27:57.162 [main] DEBUG nextflow.executor.Executor - Initializing executor: local
Nov-14 18:27:57.163 [main] DEBUG nextflow.Session - >>> barrier register (process: combine)
Nov-14 18:27:57.163 [main] DEBUG nextflow.processor.TaskProcessor - Creating operator > combine -- maxForks: 32
Nov-14 18:27:57.170 [main] DEBUG nextflow.processor.ProcessFactory - << taskConfig executor: local
Nov-14 18:27:57.171 [main] DEBUG nextflow.processor.ProcessFactory - >> processorType: 'local'
Nov-14 18:27:57.171 [main] DEBUG nextflow.executor.Executor - Initializing executor: local
Nov-14 18:27:57.171 [main] DEBUG nextflow.Session - >>> barrier register (process: annotate)
Nov-14 18:27:57.172 [main] DEBUG nextflow.processor.TaskProcessor - Creating operator > annotate -- maxForks: 32
Nov-14 18:27:57.174 [main] DEBUG nextflow.script.ScriptRunner - > Await termination 
Nov-14 18:27:57.174 [main] DEBUG nextflow.Session - Session await
Nov-14 18:27:57.309 [Task submitter] DEBUG nextflow.executor.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run
Nov-14 18:27:57.314 [Task submitter] INFO  nextflow.Session - [cf/16dae6] Submitted process > TIDDIT (RL-2047-NA12878.clean.dedup.bam)
Nov-14 18:27:57.327 [Task submitter] DEBUG nextflow.executor.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run
Nov-14 18:27:57.327 [Task submitter] INFO  nextflow.Session - [69/22846f] Submitted process > CNVnator (RL-2047-NA12878.clean.dedup.bam)
Nov-14 18:28:02.974 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 1; name: TIDDIT (RL-2047-NA12878.clean.dedup.bam); status: COMPLETED; exit: 0; error: -; workDir: /home/andax639/git/FindSV/work/cf/16dae650ec24059696a89bf6e7b404]
Nov-14 18:28:02.980 [Task monitor] DEBUG nextflow.processor.TaskProcessor - Process `TIDDIT (RL-2047-NA12878.clean.dedup.bam)` is unable to find [UnixPath]: `/home/andax639/git/FindSV/work/cf/16dae650ec24059696a89bf6e7b404/RL-2047-NA12878.clean.dedup.vcf` (pattern: `RL-2047-NA12878.clean.dedup.vcf`)
Nov-14 18:28:02.986 [Task monitor] INFO  nextflow.processor.TaskProcessor - [cf/16dae6] NOTE: Missing output file(s) `RL-2047-NA12878.clean.dedup.vcf` expected by process `TIDDIT (RL-2047-NA12878.clean.dedup.bam)` -- Error is ignored
Nov-14 18:28:02.988 [Actor Thread 6] DEBUG nextflow.Session - <<< barrier arrive (process: TIDDIT)
Nov-14 18:28:13.288 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 2; name: CNVnator (RL-2047-NA12878.clean.dedup.bam); status: COMPLETED; exit: 0; error: -; workDir: /home/andax639/git/FindSV/work/69/22846ff37ecf1c9c8fd3cdc7b39000]
Nov-14 18:28:13.304 [Actor Thread 1] DEBUG nextflow.Session - <<< barrier arrive (process: CNVnator)
Nov-14 18:28:13.305 [Actor Thread 1] DEBUG nextflow.Session - <<< barrier arrive (process: combine)
Nov-14 18:28:13.305 [Actor Thread 1] DEBUG nextflow.Session - <<< barrier arrive (process: annotate)
Nov-14 18:28:13.306 [main] DEBUG nextflow.Session - Session await > all process finished
Nov-14 18:28:13.306 [Task monitor] DEBUG n.processor.TaskPollingMonitor - <<< barrier arrives (monitor: local)
Nov-14 18:28:13.306 [main] DEBUG nextflow.Session - Session await > all barriers passed
Nov-14 18:28:13.328 [main] DEBUG nextflow.trace.StatsObserver - Workflow completed > WorkflowStats[succeedCount=1; failedCount=0; ignoredCount=1; cachedCount=0; succeedDuration=29.5s; failedDuration=591ms; cachedDuration=0ms]
Nov-14 18:28:13.328 [main] DEBUG nextflow.trace.TraceFileObserver - Flow completing -- flushing trace file
Nov-14 18:28:13.342 [main] DEBUG nextflow.CacheDB - Closing CacheDB done
Nov-14 18:28:13.390 [main] DEBUG nextflow.script.ScriptRunner - > Execution complete -- Goodbye
xexpanderx commented 5 years ago

And this is my FindSV.conf file:

cat FindSV.conf 
process {
    //the executor, visit the nextflow website for more info
    executor = 'local'
    cpus = 1
    time = "1d"

    clusterOptions = {
        //your acount, you need not change this if you use local executor
        '-A local'
    }
}

params {
    //the output directory
    working_dir='/beegfs/wp3/WGS/CNV/'

    //----TIDDIT----------
    //minimum number of discordant pairs for calling a variant
    TIDDIT_pairs=5
    //number of split reads or calling small variants
    TIDDIT_reads=4
    //lowest mapping quality of a discordant pair
    TIDDIT_q=5

    //---------CNVnator--------
    //path to the folder containing reference fasta files split per chromosmoe
    CNVnator_reference_dir_path='/data/ref_genomes/GRCh37/'
    //bin size of cnvnator, generally, small bin size leads to high sensitivity and worse precision, and the other way around
    CNVnator_bin_size='1000'

    //-----internal----------
    //contig sort path, this script i located in the FindSV internal_scripts folder
    contig_sort_path='/home/andax639/git/FindSV/internal_scripts/contigSort.py'
    clear_vep_path='/home/andax639/git/FindSV/internal_scripts/clear_vep.py'
    cleanVCF_path='/home/andax639/git/FindSV/internal_scripts/cleanVCF.py'
    the_annotator_path='/home/andax639/git/FindSV/internal_scripts/the_annotator.py'
    gene_keys_dir_path='/home/andax639/git/FindSV/gene_keys'
    frequency_filter_path='/home/andax639/git/FindSV/internal_scripts/frequency_filter.py'
    FindSV_home='/home/andax639/git/FindSV'

    //----------reference--------
    //path to reference fasta file, indexed using bwa, and samtools 0.19
    genome='/data/ref_genomes/VEP/homo_sapiens/98_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz'

    //-----VEP_-------------
    //path to the vep script
    VEP_path='vep'
    //applied on SV
    vep_args="--cache --force_overwrite --buffer_size 5 --offline --assembly GRCh37 --vcf --per_gene --format vcf -q --symbol"
    //applied on the Assemblatron snv VCF
    vep_snv_args="--force_overwrite --hgvs --symbol --sift b --polyphen b --vcf --offline --per_gene --cache --assembly GRCh37 --symbol --check_existing --gene_phenotype --af --max_af --af_1kg --af_gnomad"

    //------SVDB---------
    //The path to the SVDB script, The SVDB_X_OCC and SVDB_X_FRQ tags indicate the allele frequence keys stored  in the info field(commonly AF and AC).
    SVDB_script_path={SVDB_script_path}
    //path to the multisample vcf database of SVDB (any multisample SV vcf should work)
    SVDB_path='/data/ref_genomes/GNOMA/popmax_sv_gnomad.vcf'
    SVDB_1_OCC="OCC"
    SVDB_1_FRQ="FRQ"

    //additional SV databases (add if you find any!)
    SVDB_path2='""'
    SVDB_2_OCC="OCC"
    SVDB_2_FRQ="FRQ"

    SVDB_path3='""'
    SVDB_3_OCC="OCC"
    SVDB_3_FRQ="FRQ"

    //overlap to consider two variants the same
    SVDB_overlap='0.6'
    //maximum distance between two breakpoints
    SVDB_distance='10000'
    //All variants above this frequencies will be cleared from the final output vcf
    SVDB_limit='0.2'

    //-------GENMOD------------
    //the path to the gnemod ini file

    genmod_rank_model_path='/home/andax639/git/FindSV/genmod_SV.txt'

}

trace {
    fields = 'task_id,hash,name,status,tag'
}
xexpanderx commented 5 years ago

Two things catches my eyes looking at nextflow log:

-with-trace /beegfs/wp3/WGS/CNV//trace.txt

Here we have an extra "/" before trace.txt. Is this ok?

Furthermore,

Work-dir: /home/andax639/git/FindSV/work [nfs]

But in my FindSV.conf I have this:

working_dir='/beegfs/wp3/WGS/CNV/'

Those should not be the same thing?

J35P312 commented 5 years ago

How nice! We are making progress =P.

The config looks fine! The "/home/andax639/git/FindSV/work" is used by nextflow for keeping track of the processes, once the processes are complete, the results are copied to '/beegfs/wp3/WGS/CNV/' Things look a bit messy because FindSV is a python wrapper around a nextflow pipeline

The pipeline stopped because TIDDIT failed: [cf/16dae6] NOTE: Missing output file(s) RL-2047-NA12878.clean.dedup.vcf expected by process TIDDIT (RL-2047-NA12878.clean.dedup.bam)

We need to check the TIDDIT log files, these are stored in one of the /home/andax639/git/FindSV/work subfolders.

cd into the work folder: cd /home/andax639/git/FindSV/work

the process id "cf/16dae6" is the first few letters of the tiddit subfolder, copy the process id, and autocomplete the path using tab:

cd cf/16dae6

after pressing tab you will get something random looking like this:

cd cf/16dae6fsdfsdgsdfasfdsgfdgrsefdfdra

cd into that folder, and open the log file:

tail .command.out tail .command.err

do you see any error message?

The most common is "reference mismatch" indicating that the reference used for aligning the data differs from the reference used as input to TIDDIT.

xexpanderx commented 5 years ago

Thank you!

Hmm,

cat .command.log 
nxf-scratch-dir compute09:/tmp/nxf.NcBzHEUTIT
error,  could not find the reference file
cp: cannot stat ‘RL-2047-NA12878.clean.dedup.vcf’: No such file or directory
cp: cannot stat ‘RL-2047-NA12878.clean.dedup.wig’: No such file or directory
cp: cannot stat ‘RL-2047-NA12878.clean.dedup.ploidy.tab’: No such file or directory

Hmm, could it be the reference file from VEP? I am thinking about this one;

genome='/data/ref_genomes/VEP/homo_sapiens/98_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz'

That one is obviously there.

xexpanderx commented 5 years ago

In the directory /data/ref_genomes/VEP/homo_sapiens/98_GRCh37/, you se following files;

Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz
Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz.fai
Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz.gzi
xexpanderx commented 5 years ago

I also see in your manual that you mention "download VEP cache-files", could you tell me specifically which cache-files I should download? I have my VEP cache-files in

~/.vep/

I have following directories in my .vep (human reference file is moved to /data/ref_genomes/VEP/homo_sapiens/98_GRCh37/ ) :

ls .vep/
homo_sapiens_merged
Plugins

homo_sapiens_merged contains:

/usr/bin/ls -1 homo_sapiens_merged/98_GRCh37/
1
10
11
12
13
14
15
16
17
18
19
2
20
21
22
3
4
5
6
7
8
9
chr_synonyms.txt
GL000191.1
GL000192.1
GL000193.1
GL000194.1
GL000195.1
GL000196.1
GL000197.1
GL000198.1
GL000199.1
GL000200.1
GL000201.1
GL000202.1
GL000203.1
GL000204.1
GL000205.1
GL000206.1
GL000207.1
GL000208.1
GL000209.1
GL000210.1
GL000211.1
GL000212.1
GL000213.1
GL000214.1
GL000215.1
GL000216.1
GL000217.1
GL000218.1
GL000219.1
GL000220.1
GL000221.1
GL000222.1
GL000223.1
GL000224.1
GL000225.1
GL000226.1
GL000227.1
GL000228.1
GL000229.1
GL000230.1
GL000231.1
GL000232.1
GL000233.1
GL000234.1
GL000235.1
GL000236.1
GL000237.1
GL000238.1
GL000239.1
GL000240.1
GL000241.1
GL000242.1
GL000243.1
GL000244.1
GL000245.1
GL000246.1
GL000247.1
GL000248.1
GL000249.1
HG1007_PATCH
HG1032_PATCH
HG104_HG975_PATCH
HG1063_PATCH
HG1074_PATCH
HG1079_PATCH
HG1082_HG167_PATCH
HG1091_PATCH
HG1133_PATCH
HG1146_PATCH
HG115_PATCH
HG1208_PATCH
HG1211_PATCH
HG122_PATCH
HG1257_PATCH
HG1287_PATCH
HG1292_PATCH
HG1293_PATCH
HG1304_PATCH
HG1308_PATCH
HG1322_PATCH
HG1350_HG959_PATCH
HG1423_PATCH
HG1424_PATCH
HG1425_PATCH
HG1426_PATCH
HG142_HG150_NOVEL_TEST
HG1433_PATCH
HG1434_PATCH
HG1435_PATCH
HG1436_HG1432_PATCH
HG1437_PATCH
HG1438_PATCH
HG1439_PATCH
HG1440_PATCH
HG1441_PATCH
HG1442_PATCH
HG1443_HG1444_PATCH
HG144_PATCH
HG1453_PATCH
HG1458_PATCH
HG1459_PATCH
HG1462_PATCH
HG1463_PATCH
HG1471_PATCH
HG1472_PATCH
HG1473_PATCH
HG1479_PATCH
HG1486_PATCH
HG1487_PATCH
HG1488_PATCH
HG1490_PATCH
HG1497_PATCH
HG14_PATCH
HG1500_PATCH
HG1501_PATCH
HG1502_PATCH
HG151_NOVEL_TEST
HG1591_PATCH
HG1592_PATCH
HG1595_PATCH
HG1699_PATCH
HG174_HG254_PATCH
HG183_PATCH
HG185_PATCH
HG186_PATCH
HG193_PATCH
HG19_PATCH
HG237_PATCH
HG243_PATCH
HG256_PATCH
HG271_PATCH
HG27_PATCH
HG280_PATCH
HG281_PATCH
HG299_PATCH
HG29_PATCH
HG305_PATCH
HG306_PATCH
HG311_PATCH
HG325_PATCH
HG329_PATCH
HG339_PATCH
HG344_PATCH
HG348_PATCH
HG357_PATCH
HG375_PATCH
HG385_PATCH
HG388_HG400_PATCH
HG414_PATCH
HG417_PATCH
HG418_PATCH
HG444_PATCH
HG480_HG481_PATCH
HG497_PATCH
HG506_HG507_HG1000_PATCH
HG50_PATCH
HG531_PATCH
HG536_PATCH
HG544_PATCH
HG686_PATCH
HG706_PATCH
HG729_PATCH
HG730_PATCH
HG736_PATCH
HG745_PATCH
HG747_PATCH
HG748_PATCH
HG75_PATCH
HG79_PATCH
HG7_PATCH
HG858_PATCH
HG865_PATCH
HG871_PATCH
HG873_PATCH
HG883_PATCH
HG905_PATCH
HG944_PATCH
HG946_PATCH
HG953_PATCH
HG957_PATCH
HG962_PATCH
HG971_PATCH
HG979_PATCH
HG987_PATCH
HG989_PATCH
HG990_PATCH
HG991_PATCH
HG995_PATCH
HG996_PATCH
HG998_1_PATCH
HG998_2_PATCH
HG999_1_PATCH
HG999_2_PATCH
HSCHR10_1_CTG2
HSCHR10_1_CTG5
HSCHR11_1_CTG1_1
HSCHR1_1_CTG31
HSCHR12_1_CTG1
HSCHR12_1_CTG2
HSCHR12_1_CTG2_1
HSCHR12_1_CTG5
HSCHR12_2_CTG2
HSCHR12_2_CTG2_1
HSCHR12_3_CTG2_1
HSCHR1_2_CTG31
HSCHR1_3_CTG31
HSCHR15_1_CTG4
HSCHR15_1_CTG8
HSCHR16_1_CTG3_1
HSCHR16_2_CTG3_1
HSCHR17_1
HSCHR17_1_CTG1
HSCHR17_1_CTG4
HSCHR17_2_CTG4
HSCHR17_3_CTG4
HSCHR17_4_CTG4
HSCHR17_5_CTG4
HSCHR17_6_CTG4
HSCHR18_1_CTG1_1
HSCHR18_1_CTG2
HSCHR18_1_CTG2_1
HSCHR18_2_CTG1_1
HSCHR18_2_CTG2
HSCHR18_2_CTG2_1
HSCHR19_1_CTG3
HSCHR19_1_CTG3_1
HSCHR19_2_CTG3
HSCHR19_3_CTG3
HSCHR19LRC_COX1_CTG1
HSCHR19LRC_COX2_CTG1
HSCHR19LRC_LRC_I_CTG1
HSCHR19LRC_LRC_J_CTG1
HSCHR19LRC_LRC_S_CTG1
HSCHR19LRC_LRC_T_CTG1
HSCHR19LRC_PGF1_CTG1
HSCHR19LRC_PGF2_CTG1
HSCHR20_1_CTG1
HSCHR21_1_CTG1_1
HSCHR21_2_CTG1_1
HSCHR21_3_CTG1_1
HSCHR21_4_CTG1_1
HSCHR2_1_CTG1
HSCHR2_1_CTG12
HSCHR22_1_CTG1
HSCHR22_1_CTG2
HSCHR22_2_CTG1
HSCHR2_2_CTG12
HSCHR3_1_CTG1
HSCHR3_1_CTG2_1
HSCHR4_1
HSCHR4_1_CTG12
HSCHR4_1_CTG6
HSCHR4_2_CTG9
HSCHR5_1_CTG1
HSCHR5_1_CTG2
HSCHR5_1_CTG5
HSCHR5_2_CTG1
HSCHR5_3_CTG1
HSCHR6_1_CTG5
HSCHR6_2_CTG5
HSCHR6_MHC_APD
HSCHR6_MHC_COX
HSCHR6_MHC_DBB
HSCHR6_MHC_MANN
HSCHR6_MHC_MCF
HSCHR6_MHC_QBL
HSCHR6_MHC_SSTO
HSCHR7_1_CTG6
HSCHR9_1_CTG1
HSCHR9_1_CTG35
HSCHR9_2_CTG35
HSCHR9_3_CTG35
info.txt
LRG_1
LRG_10
LRG_100
LRG_101
LRG_102
LRG_103
LRG_104
LRG_105
LRG_106
LRG_107
LRG_108
LRG_109
LRG_11
LRG_110
LRG_111
LRG_112
LRG_113
LRG_114
LRG_115
LRG_116
LRG_117
LRG_118
LRG_119
LRG_12
LRG_120
LRG_121
LRG_122
LRG_123
LRG_124
LRG_125
LRG_126
LRG_127
LRG_128
LRG_129
LRG_13
LRG_130
LRG_132
LRG_133
LRG_134
LRG_135
LRG_136
LRG_137
LRG_138
LRG_139
LRG_140
LRG_141
LRG_142
LRG_144
LRG_145
LRG_146
LRG_147
LRG_148
LRG_149
LRG_15
LRG_150
LRG_151
LRG_152
LRG_154
LRG_155
LRG_156
LRG_157
LRG_158
LRG_159
LRG_16
LRG_160
LRG_161
LRG_162
LRG_163
LRG_164
LRG_165
LRG_168
LRG_169
LRG_17
LRG_170
LRG_171
LRG_172
LRG_173
LRG_174
LRG_175
LRG_176
LRG_177
LRG_178
LRG_179
LRG_18
LRG_180
LRG_182
LRG_183
LRG_184
LRG_185
LRG_187
LRG_188
LRG_189
LRG_19
LRG_190
LRG_191
LRG_192
LRG_193
LRG_194
LRG_195
LRG_196
LRG_197
LRG_198
LRG_199
LRG_2
LRG_20
LRG_200
LRG_201
LRG_202
LRG_203
LRG_204
LRG_205
LRG_207
LRG_208
LRG_209
LRG_21
LRG_210
LRG_211
LRG_212
LRG_213
LRG_214
LRG_215
LRG_216
LRG_217
LRG_218
LRG_219
LRG_22
LRG_220
LRG_221
LRG_226
LRG_227
LRG_228
LRG_229
LRG_23
LRG_230
LRG_231
LRG_234
LRG_236
LRG_239
LRG_24
LRG_241
LRG_242
LRG_243
LRG_245
LRG_246
LRG_248
LRG_249
LRG_25
LRG_250
LRG_251
LRG_252
LRG_253
LRG_254
LRG_255
LRG_256
LRG_257
LRG_258
LRG_26
LRG_260
LRG_261
LRG_262
LRG_263
LRG_264
LRG_265
LRG_266
LRG_267
LRG_268
LRG_269
LRG_27
LRG_270
LRG_271
LRG_272
LRG_273
LRG_274
LRG_275
LRG_276
LRG_278
LRG_279
LRG_28
LRG_280
LRG_281
LRG_283
LRG_284
LRG_285
LRG_286
LRG_287
LRG_288
LRG_289
LRG_29
LRG_290
LRG_291
LRG_292
LRG_293
LRG_294
LRG_295
LRG_296
LRG_298
LRG_3
LRG_30
LRG_300
LRG_301
LRG_307
LRG_308
LRG_31
LRG_311
LRG_316
LRG_317
LRG_318
LRG_319
LRG_32
LRG_321
LRG_322
LRG_325
LRG_326
LRG_327
LRG_328
LRG_329
LRG_33
LRG_330
LRG_331
LRG_332
LRG_333
LRG_334
LRG_335
LRG_336
LRG_337
LRG_34
LRG_340
LRG_341
LRG_343
LRG_345
LRG_346
LRG_347
LRG_348
LRG_349
LRG_35
LRG_350
LRG_352
LRG_353
LRG_354
LRG_355
LRG_356
LRG_357
LRG_358
LRG_359
LRG_36
LRG_361
LRG_362
LRG_363
LRG_364
LRG_365
LRG_366
LRG_368
LRG_369
LRG_37
LRG_371
LRG_372
LRG_373
LRG_374
LRG_375
LRG_376
LRG_377
LRG_378
LRG_379
LRG_38
LRG_380
LRG_382
LRG_383
LRG_384
LRG_385
LRG_386
LRG_388
LRG_389
LRG_39
LRG_390
LRG_391
LRG_392
LRG_393
LRG_394
LRG_4
LRG_40
LRG_403
LRG_404
LRG_405
LRG_406
LRG_408
LRG_409
LRG_41
LRG_410
LRG_411
LRG_413
LRG_414
LRG_415
LRG_417
LRG_419
LRG_42
LRG_421
LRG_422
LRG_424
LRG_426
LRG_43
LRG_433
LRG_437
LRG_439
LRG_44
LRG_440
LRG_442
LRG_444
LRG_445
LRG_446
LRG_447
LRG_448
LRG_449
LRG_45
LRG_450
LRG_451
LRG_452
LRG_454
LRG_455
LRG_456
LRG_457
LRG_458
LRG_46
LRG_460
LRG_461
LRG_462
LRG_463
LRG_464
LRG_465
LRG_466
LRG_467
LRG_469
LRG_47
LRG_470
LRG_471
LRG_472
LRG_473
LRG_474
LRG_475
LRG_476
LRG_48
LRG_482
LRG_488
LRG_49
LRG_490
LRG_491
LRG_492
LRG_493
LRG_495
LRG_496
LRG_497
LRG_498
LRG_499
LRG_5
LRG_50
LRG_500
LRG_502
LRG_503
LRG_504
LRG_505
LRG_507
LRG_509
LRG_51
LRG_510
LRG_511
LRG_512
LRG_513
LRG_517
LRG_519
LRG_52
LRG_520
LRG_521
LRG_522
LRG_523
LRG_524
LRG_526
LRG_527
LRG_53
LRG_535
LRG_54
LRG_55
LRG_56
LRG_57
LRG_58
LRG_59
LRG_6
LRG_60
LRG_61
LRG_62
LRG_63
LRG_64
LRG_65
LRG_66
LRG_67
LRG_69
LRG_7
LRG_70
LRG_71
LRG_715
LRG_717
LRG_72
LRG_73
LRG_74
LRG_75
LRG_76
LRG_77
LRG_78
LRG_79
LRG_8
LRG_80
LRG_81
LRG_83
LRG_84
LRG_85
LRG_86
LRG_88
LRG_89
LRG_90
LRG_91
LRG_92
LRG_93
LRG_94
LRG_96
LRG_97
LRG_98
LRG_99
MT
X
Y
J35P312 commented 5 years ago

That's odd! Maybe singularity is unable to reach that folder.

If you try this:

singularity exec FindSV.simg zcat /data/ref_genomes/VEP/homo_sapiens/98_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz

or you can try singularity exec FindSV.simg ls /data

Do you get an error? You might need to bind the /data folder into singularity. You should be able to find your home folder

singularity exec FindSV.simg ls /home

so you will probably solve this error by copying the reference to your home folder.

You can read more about binding and singularity here:

https://sylabs.io/guides/3.0/user-guide/bind_paths_and_mounts.html

I will fix this in the next version of FindSV, on our server, every folder is available to singularity.

Regarding vep:

~/.vep is perfect, that is were vep will look per default.

you will need to make a "homo_sapiens" subfolder:

mkdir ~/.vep/homo_sapiens/

within this subolder, you need to put a vep cache for your current version of vep, my ~/.vep/homo_sapiens/ folder looks like this:

87_GRCh37 87_GRCh38 92_GRCh37 92_GRCh38

(i.e I have been running vep 87, and vep 92).

You can get a cache for the latest vep (98) here:

wget ftp://ftp.ensembl.org/pub/release-98/variation/indexed_vep_cache/homo_sapiens_vep_98_GRCh37.tar.gz

check your vep version before you download. download it into your ~/.vep/homo_sapiens/ folder, and decompress it, now vep should work:

tar -xzf homo_sapiens_vep_98_GRCh37.tar.gz

I would try vep on a small vcf (SNV or SV is fine) before running the pipeline good luck! //Jesper

xexpanderx commented 5 years ago

Yes, the /data is not bind into singularity:

singularity exec FindSV.simg zcat /data/ref_genomes/VEP/homo_sapiens/98_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz
gzip: /data/ref_genomes/VEP/homo_sapiens/98_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz: No such file or directory
xexpanderx commented 5 years ago

All files need to be on my home, even the bam files.

xexpanderx commented 5 years ago

Is it possible to rebind everything to another directory, cause my home directory is too small, I am thinking to move FindSV folder to a separate partition. I guess I need to create a new singularitry image?

J35P312 commented 5 years ago

It should be possible to bind other folders, have a look here for more details. https://sylabs.io/guides/3.0/user-guide/bind_paths_and_mounts.html

you could try something like this:

export SINGULARITY_BIND="/data"

then test the container:

singularity exec FindSV.simg zcat /data/ref_genomes/VEP/homo_sapiens/98_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz

you should add the "export" command to your FindSV_env.sh script as well.

I don't think you need to bind the bam folder: FindSV (the nextflow script), will move these files to the scratch disk/tmp. And those folders should already be mounted by singularity. But you can double check the cnvnator logs to be sure (CNVnator is also run from the singularity image).

xexpanderx commented 5 years ago

Ah, ok, that looks simple enough. Let me try it out.

xexpanderx commented 5 years ago

Next error:

python2 FindSV.py --bam /projects/wp3/nobackup/WGS/testCNV/RL-2047-NA12878.clean.dedup.bam --output /beegfs/wp3/WGS/CNV/ --config ~/git/FindSV/FindSV.conf
Processing, please do not turn off FindSV
mkdir: cannot create directory ‘/beegfs/wp3/WGS/CNV/’: File exists
SAMPLE_ID:/projects/wp3/nobackup/WGS/testCNV/RL-2047-NA12878.clean.dedup.bam
N E X T F L O W  ~  version 18.10.1
Launching `/home/andax639/git/FindSV/FindSV_core.nf` [jovial_jones] - revision: a648677aaf
[warm up] executor > local
[2a/a09675] Submitted process > CNVnator (RL-2047-NA12878.clean.dedup.bam)
[a9/6c2c62] Submitted process > TIDDIT (RL-2047-NA12878.clean.dedup.bam)
[a9/6c2c62] NOTE: Missing output file(s) `RL-2047-NA12878.clean.dedup.vcf` expected by process `TIDDIT (RL-2047-NA12878.clean.dedup.bam)` -- Error is ignored
RL-2047-NA12878.clean.dedup
FAILED:CALLING
DONE
cat .command.log
nxf-scratch-dir compute09:/tmp/nxf.mYQt7HJnKA
error,  could not find the bam file
cp: cannot stat ‘RL-2047-NA12878.clean.dedup.vcf’: No such file or directory
cp: cannot stat ‘RL-2047-NA12878.clean.dedup.wig’: No such file or directory
cp: cannot stat ‘RL-2047-NA12878.clean.dedup.ploidy.tab’: No such file or directory

I exported following;

export SINGULARITY_BIND="/data,/projects"

running (although this should not matter)

singularity exec FindSV.simg ls /projects/wp3/nobackup/WGS/testCNV/RL-2047-NA12878.clean.dedup.bam
/projects/wp3/nobackup/WGS/testCNV/RL-2047-NA12878.clean.dedup.bam

The bam file is found within singularity. At least we are one step closer :)

xexpanderx commented 5 years ago

cat .nextflow.log

cat .nextflow.log
Nov-15 13:39:50.708 [main] DEBUG nextflow.cli.Launcher - $> /home/andax639/wget/nextflow-18.10.1-all /home/andax639/git/FindSV/FindSV_core.nf --bam /projects/wp3/nobackup/WGS/testCNV/RL-2047-NA12878.clean.dedup.bam -c /home/andax639/git/FindSV/FindSV.conf --working_dir /beegfs/wp3/WGS/CNV/ -with-trace /beegfs/wp3/WGS/CNV//trace.txt
Nov-15 13:39:50.855 [main] INFO  nextflow.cli.CmdRun - N E X T F L O W  ~  version 18.10.1
Nov-15 13:39:50.869 [main] INFO  nextflow.cli.CmdRun - Launching `/home/andax639/git/FindSV/FindSV_core.nf` [cranky_goodall] - revision: a648677aaf
Nov-15 13:39:50.886 [main] DEBUG nextflow.config.ConfigBuilder - User config file: /home/andax639/git/FindSV/FindSV.conf
Nov-15 13:39:50.886 [main] DEBUG nextflow.config.ConfigBuilder - Parsing config file: /home/andax639/git/FindSV/FindSV.conf
Nov-15 13:39:50.917 [main] DEBUG nextflow.config.ConfigBuilder - Applying config profile: `standard`
Nov-15 13:39:51.579 [main] DEBUG nextflow.Session - Session uuid: c1fe9284-e6f8-496a-ba83-aa191513c50a
Nov-15 13:39:51.579 [main] DEBUG nextflow.Session - Run name: cranky_goodall
Nov-15 13:39:51.580 [main] DEBUG nextflow.Session - Executor pool size: 32
Nov-15 13:39:51.594 [main] DEBUG nextflow.cli.CmdRun - 
  Version: 18.10.1 build 5003
  Modified: 24-10-2018 14:03 UTC (16:03 CEST)
  System: Linux 3.10.0-1062.4.1.el7.x86_64
  Runtime: Groovy 2.5.3 on Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12
  Encoding: UTF-8 (UTF-8)
  Process: 8708@compute09 [192.168.3.19]
  CPUs: 32 - Mem: 125.4 GB (45.2 GB) - Swap: 4 GB (4 GB)
Nov-15 13:39:51.631 [main] DEBUG nextflow.Session - Work-dir: /home/andax639/git/FindSV/work [nfs]
Nov-15 13:39:51.632 [main] DEBUG nextflow.Session - Script base path does not exist or is not a directory: /home/andax639/git/FindSV/bin
Nov-15 13:39:51.969 [main] DEBUG nextflow.Session - Session start invoked
Nov-15 13:39:51.974 [main] DEBUG nextflow.processor.TaskDispatcher - Dispatcher > start
Nov-15 13:39:51.974 [main] DEBUG nextflow.trace.TraceFileObserver - Flow starting -- trace file: /beegfs/wp3/WGS/CNV/trace.txt
Nov-15 13:39:51.982 [main] DEBUG nextflow.script.ScriptRunner - > Script parsing
Nov-15 13:39:52.677 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution
Nov-15 13:39:52.929 [main] DEBUG nextflow.processor.ProcessFactory - Discovered executor class: nextflow.executor.IgExecutor
Nov-15 13:39:53.085 [main] DEBUG nextflow.processor.ProcessFactory - << taskConfig executor: local
Nov-15 13:39:53.086 [main] DEBUG nextflow.processor.ProcessFactory - >> processorType: 'local'
Nov-15 13:39:53.090 [main] DEBUG nextflow.executor.Executor - Initializing executor: local
Nov-15 13:39:53.092 [main] INFO  nextflow.executor.Executor - [warm up] executor > local
Nov-15 13:39:53.095 [main] DEBUG n.processor.LocalPollingMonitor - Creating local task monitor for executor 'local' > cpus=32; memory=125.4 GB; capacity=32; pollInterval=100ms; dumpInterval=5m
Nov-15 13:39:53.099 [main] DEBUG nextflow.processor.TaskDispatcher - Starting monitor: LocalPollingMonitor
Nov-15 13:39:53.099 [main] DEBUG n.processor.TaskPollingMonitor - >>> barrier register (monitor: local)
Nov-15 13:39:53.101 [main] DEBUG nextflow.executor.Executor - Invoke register for executor: local
Nov-15 13:39:53.128 [main] DEBUG nextflow.Session - >>> barrier register (process: TIDDIT)
Nov-15 13:39:53.131 [main] DEBUG nextflow.processor.TaskProcessor - Creating operator > TIDDIT -- maxForks: 32
Nov-15 13:39:53.152 [main] DEBUG nextflow.processor.ProcessFactory - << taskConfig executor: local
Nov-15 13:39:53.152 [main] DEBUG nextflow.processor.ProcessFactory - >> processorType: 'local'
Nov-15 13:39:53.152 [main] DEBUG nextflow.executor.Executor - Initializing executor: local
Nov-15 13:39:53.152 [main] DEBUG nextflow.Session - >>> barrier register (process: CNVnator)
Nov-15 13:39:53.153 [main] DEBUG nextflow.processor.TaskProcessor - Creating operator > CNVnator -- maxForks: 32
Nov-15 13:39:53.164 [main] DEBUG nextflow.processor.ProcessFactory - << taskConfig executor: local
Nov-15 13:39:53.165 [main] DEBUG nextflow.processor.ProcessFactory - >> processorType: 'local'
Nov-15 13:39:53.165 [main] DEBUG nextflow.executor.Executor - Initializing executor: local
Nov-15 13:39:53.165 [main] DEBUG nextflow.Session - >>> barrier register (process: combine)
Nov-15 13:39:53.165 [main] DEBUG nextflow.processor.TaskProcessor - Creating operator > combine -- maxForks: 32
Nov-15 13:39:53.172 [main] DEBUG nextflow.processor.ProcessFactory - << taskConfig executor: local
Nov-15 13:39:53.172 [main] DEBUG nextflow.processor.ProcessFactory - >> processorType: 'local'
Nov-15 13:39:53.172 [main] DEBUG nextflow.executor.Executor - Initializing executor: local
Nov-15 13:39:53.173 [main] DEBUG nextflow.Session - >>> barrier register (process: annotate)
Nov-15 13:39:53.173 [main] DEBUG nextflow.processor.TaskProcessor - Creating operator > annotate -- maxForks: 32
Nov-15 13:39:53.175 [main] DEBUG nextflow.script.ScriptRunner - > Await termination 
Nov-15 13:39:53.175 [main] DEBUG nextflow.Session - Session await
Nov-15 13:39:53.245 [Actor Thread 5] DEBUG nextflow.util.CacheHelper - Unable to get file attributes file: /home/andax639/git/FindSV/~/.data/GRCh37 -- Cause: java.nio.file.NoSuchFileException: /home/andax639/git/FindSV/~/.data/GRCh37
Nov-15 13:39:53.306 [Task submitter] DEBUG nextflow.executor.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run
Nov-15 13:39:53.311 [Task submitter] INFO  nextflow.Session - [46/a190ee] Submitted process > TIDDIT (RL-2047-NA12878.clean.dedup.bam)
Nov-15 13:39:53.328 [Task submitter] DEBUG nextflow.executor.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run
Nov-15 13:39:53.328 [Task submitter] INFO  nextflow.Session - [7a/483a26] Submitted process > CNVnator (RL-2047-NA12878.clean.dedup.bam)
Nov-15 13:39:58.984 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 1; name: TIDDIT (RL-2047-NA12878.clean.dedup.bam); status: COMPLETED; exit: 0; error: -; workDir: /home/andax639/git/FindSV/work/46/a190eef08426c4bd2e4bfda6bfdd5c]
Nov-15 13:39:58.990 [Task monitor] DEBUG nextflow.processor.TaskProcessor - Process `TIDDIT (RL-2047-NA12878.clean.dedup.bam)` is unable to find [UnixPath]: `/home/andax639/git/FindSV/work/46/a190eef08426c4bd2e4bfda6bfdd5c/RL-2047-NA12878.clean.dedup.vcf` (pattern: `RL-2047-NA12878.clean.dedup.vcf`)
Nov-15 13:39:58.996 [Task monitor] INFO  nextflow.processor.TaskProcessor - [46/a190ee] NOTE: Missing output file(s) `RL-2047-NA12878.clean.dedup.vcf` expected by process `TIDDIT (RL-2047-NA12878.clean.dedup.bam)` -- Error is ignored
Nov-15 13:39:58.999 [Actor Thread 1] DEBUG nextflow.Session - <<< barrier arrive (process: TIDDIT)
Nov-15 13:40:09.213 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 2; name: CNVnator (RL-2047-NA12878.clean.dedup.bam); status: COMPLETED; exit: 0; error: -; workDir: /home/andax639/git/FindSV/work/7a/483a26c1e5d76070e9f5593da715f7]
Nov-15 13:40:09.229 [Actor Thread 8] DEBUG nextflow.Session - <<< barrier arrive (process: CNVnator)
Nov-15 13:40:09.230 [Actor Thread 8] DEBUG nextflow.Session - <<< barrier arrive (process: combine)
Nov-15 13:40:09.230 [Actor Thread 8] DEBUG nextflow.Session - <<< barrier arrive (process: annotate)
Nov-15 13:40:09.230 [main] DEBUG nextflow.Session - Session await > all process finished
Nov-15 13:40:09.231 [Task monitor] DEBUG n.processor.TaskPollingMonitor - <<< barrier arrives (monitor: local)
Nov-15 13:40:09.231 [main] DEBUG nextflow.Session - Session await > all barriers passed
Nov-15 13:40:09.252 [main] DEBUG nextflow.trace.StatsObserver - Workflow completed > WorkflowStats[succeedCount=1; failedCount=0; ignoredCount=1; cachedCount=0; succeedDuration=27.3s; failedDuration=605ms; cachedDuration=0ms]
Nov-15 13:40:09.252 [main] DEBUG nextflow.trace.TraceFileObserver - Flow completing -- flushing trace file
Nov-15 13:40:09.305 [main] DEBUG nextflow.CacheDB - Closing CacheDB done
Nov-15 13:40:09.316 [main] DEBUG nextflow.script.ScriptRunner - > Execution complete -- Goodbye

my new conf (I moved the reference file to home before export Singulairty_Bind was proposed):

cat FindSV.conf 
process {
    //the executor, visit the nextflow website for more info
    executor = 'local'
    cpus = 32
    time = "1d"

    clusterOptions = {
        //your acount, you need not change this if you use local executor
        '-A local'
    }
}

params {
    //the output directory
    working_dir='/beegfs/wp3/WGS/CNV/'

    //----TIDDIT----------
    //minimum number of discordant pairs for calling a variant
    TIDDIT_pairs=5
    //number of split reads or calling small variants
    TIDDIT_reads=4
    //lowest mapping quality of a discordant pair
    TIDDIT_q=5

    //---------CNVnator--------
    //path to the folder containing reference fasta files split per chromosmoe
    CNVnator_reference_dir_path='~/.data/GRCh37/'
    //bin size of cnvnator, generally, small bin size leads to high sensitivity and worse precision, and the other way around
    CNVnator_bin_size='1000'

    //-----internal----------
    //contig sort path, this script i located in the FindSV internal_scripts folder
    contig_sort_path='/home/andax639/git/FindSV/internal_scripts/contigSort.py'
    clear_vep_path='/home/andax639/git/FindSV/internal_scripts/clear_vep.py'
    cleanVCF_path='/home/andax639/git/FindSV/internal_scripts/cleanVCF.py'
    the_annotator_path='/home/andax639/git/FindSV/internal_scripts/the_annotator.py'
    gene_keys_dir_path='/home/andax639/git/FindSV/gene_keys'
    frequency_filter_path='/home/andax639/git/FindSV/internal_scripts/frequency_filter.py'
    FindSV_home='/home/andax639/git/FindSV'

    //----------reference--------
    //path to reference fasta file, indexed using bwa, and samtools 0.19
    genome='~/.vep/homo_sapiens/98_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz'

    //-----VEP_-------------
    //path to the vep script
    VEP_path='vep'
    //applied on SV
    vep_args="--cache --force_overwrite --buffer_size 5 --offline --assembly GRCh37 --vcf --per_gene --format vcf -q --symbol"
    //applied on the Assemblatron snv VCF
    vep_snv_args="--force_overwrite --hgvs --symbol --sift b --polyphen b --vcf --offline --per_gene --cache --assembly GRCh37 --symbol --check_existing --gene_phenotype --af --max_af --af_1kg --af_gnomad"

    //------SVDB---------
    //The path to the SVDB script, The SVDB_X_OCC and SVDB_X_FRQ tags indicate the allele frequence keys stored  in the info field(commonly AF and AC).
    SVDB_script_path={SVDB_script_path}
    //path to the multisample vcf database of SVDB (any multisample SV vcf should work)
    SVDB_path='~/.data/GNOMA/popmax_sv_gnomad.vcf'
    SVDB_1_OCC="OCC"
    SVDB_1_FRQ="FRQ"

    //additional SV databases (add if you find any!)
    SVDB_path2='""'
    SVDB_2_OCC="OCC"
    SVDB_2_FRQ="FRQ"

    SVDB_path3='""'
    SVDB_3_OCC="OCC"
    SVDB_3_FRQ="FRQ"

    //overlap to consider two variants the same
    SVDB_overlap='0.6'
    //maximum distance between two breakpoints
    SVDB_distance='10000'
    //All variants above this frequencies will be cleared from the final output vcf
    SVDB_limit='0.2'

    //-------GENMOD------------
    //the path to the gnemod ini file

    genmod_rank_model_path='/home/andax639/git/FindSV/genmod_SV.txt'

}

trace {
    fields = 'task_id,hash,name,status,tag'
}
xexpanderx commented 5 years ago

In Nextflow log I see this:

Nov-15 13:39:53.245 [Actor Thread 5] DEBUG nextflow.util.CacheHelper - Unable to get file attributes file: /home/andax639/git/FindSV/~/.data/GRCh37 -- Cause: java.nio.file.NoSuchFileException: /home/andax639/git/FindSV/~/.data/GRCh37

I fixed that by adding full path instead in the conf file. Other than that, still getting "error, could not find the bam file".

xexpanderx commented 5 years ago

cnvnator log:

cat .command.log 
nxf-scratch-dir compute09:/tmp/nxf.3N5WTGuMzB
Parsing file RL-2047-NA12878.clean.dedup.bam ...
[E::hts_open_format] fail to open file 'RL-2047-NA12878.clean.dedup.bam'
Can't open file 'No chromosome/contig description given.
Writing histograms ... 
RL-2047-NA12878.clean.dedup.bam'.
No reference genome specified. Aborting parsing.
Total of 0 reads were placed.
Allocating memory ...
Done.
Can't find directory 'bin_1000' in file 'cnvnator.root'.

Can't find directory 'bin_1000' in file 'cnvnator.root'.

Can't find directory 'bin_1000' in file 'cnvnator.root'.

Reading calls ...
cat .command.err 
[E::hts_open_format] fail to open file 'RL-2047-NA12878.clean.dedup.bam'
Can't open file 'RL-2047-NA12878.clean.dedup.bam'.
No reference genome specified. Aborting parsing.
Can't find directory 'bin_1000' in file 'cnvnator.root'.

Can't find directory 'bin_1000' in file 'cnvnator.root'.

Can't find directory 'bin_1000' in file 'cnvnator.root'.

Reading calls ...
J35P312 commented 5 years ago

How nice to see that the variable export thing worked!

I think we unbound the /tmp, I wonder what happens if you run the following:

export SINGULARITY_BIND="/data,/projects,/tmp,/home"

nextflow will copy the bam to /tmp and then singularity will run it on the /tmp. Alternatively, you could open the FindSV_core.nf script, and remove any instances of "scratch true".

xexpanderx commented 5 years ago

Nope, cannot get past this error. Tried to add /tmp and /home to SINGULARITY_BIND. Then I tried withouth scratch. Same error.

This is how our directory mount structure looks like:

df -h
Filesystem                                   Size  Used Avail Use% Mounted on
devtmpfs                                      63G     0   63G   0% /dev
tmpfs                                         63G     0   63G   0% /dev/shm
tmpfs                                         63G   75M   63G   1% /run
tmpfs                                         63G     0   63G   0% /sys/fs/cgroup
/dev/mapper/centos-root                       50G  6.3G   44G  13% /
/dev/sda2                                   1014M  342M  673M  34% /boot
/dev/mapper/centos-illumina                  1.5T  187G  1.4T  13% /illumina
/dev/mapper/centos-docker--containers         50G   11G   40G  21% /var/lib/docker
/dev/mapper/centos-dockervolumes             1.0T  260G  765G  26% /var/lib/docker/volumes
head.igp1.uu.se:/home                         99G   72G   22G  78% /home
head.igp1.uu.se:/sw                          148G   79G   63G  56% /sw
beegfs_nodev                                 5.0T  3.3T  1.8T  66% /beegfs
tmpfs                                         13G     0   13G   0% /run/user/0
storage1.igp10.uu.se:gluster-storage-volume  109T   84T   26T  77% /gluster-storage-volume

tmp is mounted on / which is only about 50G.

The size of the bam file is:

ls -alh /projects/wp3/nobackup/WGS/testCNV/RL-2047-NA12878.clean.dedup.bam
-rw-rw----. 1 andax639 wp3 50G Nov 14 11:28 /projects/wp3/nobackup/WGS/testCNV/RL-2047-NA12878.clean.dedup.bam

Our scratch area (about 5Tb) is in fact: /beegfs