KCCG / ClinSV

Robust detection of clinically relevant structural and copy number variation from whole genome sequencing data
Other
61 stars 8 forks source link

[ENH] Create Docker file for v1.0 #13

Closed J-Bradlee closed 2 years ago

J-Bradlee commented 2 years ago

Create a docker file based on the install requirements found on install38.md .

So far here are some of the dependencies involved:

J-Bradlee commented 2 years ago

Progress done today:

Next Goal:

Possible hurdles:

sflodin commented 2 years ago

Progress done today:

Possible hurdles:

Next Goal:

drmjc commented 2 years ago

that's good progress! There's an updated clinsv pl file in the Google Drive which might fix the first issue

sflodin commented 2 years ago

Progress done today: • We continually found errors in the docker run command • Got the docker run command to work eventually • Installed extra dependencies which did not previously exist namely, samtools (yum install bzip2-devel). To combat this error samtools: error while loading shared libraries: libbz2.so.1.0: cannot open shared object file: No such file or directory. Perhaps we need to add even more later. • Began to download subset bam files

Possible hurdles: • More dependencies need to be added as we progress through compilation

Next Goal:

@drmjc What should be outputted by the program once the program is run successfully on this bam file NA12878.grch38.subsampled.bam?

drmjc commented 2 years ago

@sflodin, The final output is two tsv files^ containing the genetic changes (one is called $prefix.RARE_PASS_GENE.tsv) and a QC pdf file ($prefix.QC_report.pdf). Examples of each are in https://github.com/KCCG/ClinSV/tree/master/results_test_data. There will be several intermediary files created as well.

^ it might create an xlsx file instead, I don't recall

drmjc commented 2 years ago

Hi @sflodin, @J-Bradlee, the webserver is back up and running

J-Bradlee commented 2 years ago

Progress done today: • Went through the clinsv.pl file to try understand the code since there was an error with jobs not running

error samtools: error while loading shared libraries: libbz2.so.1.0: cannot open shared object file: No such file or directory.

this was fixed more permanently this time by this command. found here

sudo yum install bzip2-devel sudo ln -s find /usr/lib64/ -type f -name "libbz2.so.1*" /usr/lib64/libbz2.so.1.0

• This lead to a new error unable to fix

"samtools: error while loading shared libraries: libdeflate.so: cannot open shared object file: No such file or directory"

• Compared v0.9 and v1.0 for discrepancies to try solve the error. Pretty sure it is a dependency issue. Seems like samtools isn't working on our docker image. • Having issues running our 1.0 docker image. No jobs seem to get run. We get:

####                ClinSV                ####
##############################################
# 27/09/2021 02:44:18

# clinsv dir: /app/clinsv
# projectDir: /app
# sampleInfoFile: /app/sampleInfo.txt 
# name stem: app
# lumpyBatchSize: 5
# genome reference: /app/clinsv/refdata-b37
# run steps: all
# number input bams: 2

# Create sample info file from bam files ...
samtools: error while loading shared libraries: libdeflate.so: cannot open shared object file: No such file or directory
ln -s /app/input/NA12878.grch38.subsampled.bam /app/alignments//.bam
ln -s /app/input/NA12878.grch38.subsampled.bam.bai /app/alignments//.bam.bai
samtools: error while loading shared libraries: libdeflate.so: cannot open shared object file: No such file or directory
ln -s /app/input/NA12878.mapped.illumina.mosaik.CEU.exome.20110411.bam /app/alignments//.bam
ln -s /app/input/NA12878.mapped.illumina.mosaik.CEU.exome.20110411.bam.bai /app/alignments//.bam.bai
# Read Sample Info from /app/sampleInfo.txt
# 0 samples to process
# If not, please exit make a copy of sampleInfo.txt, modify it and rerun with -s sampleInfo_mod.txt pointing to the new sample info file. 

###### Generate the commands and scripts ######

# bigwig

# lumpy

# cnvnator

# annotate

# prioritize

# qc

###### Run jobs ######

# 27/09/2021 02:44:18 Project app app | Total jobs 0 | Remaining jobs 0 | Remaining steps   0 | Total time: 0 min

# Everything done! Exit

# writing igv session files...

We think this is due to how the sampleInfo.txt file is being created properly due to the samtools not working.

Possible hurdles:

samtools: error while loading shared libraries: libdeflate.so: cannot open shared object file: No such file or directory

  • most of the other binaries in "ClinSV_x86_64_v1.0.tar.gz" seemed to work other than samtools.
  • Also doing a smoke test is hard, the program uses up all my cpu power to run even with the smaller bam files given. So can't quite test on my local machine if we get the correct output. For now we are making the assumption if the program takes up all my cpu power, as in the v0.9 docker image, it works.

Next steps:

drmjc commented 2 years ago

Thanks gents.

Looks like this issue has been seen by the conda devs - https://github.com/bioconda/bioconda-recipes/issues/16529

Since samtools is one of the most widely used and tested bioinformatics tools, I recommend trying with the latest version, which may be easier to install. If the major version number used by clinsv and the latest are the same then the cli should be the same. if the major version number has changed, the you may need to update the syntax of any samtools commands that may have changed. I expect there are a modest number of changes that may need to be made.

drmjc commented 2 years ago

PS if you upgrade samtools version, then also upgrade the bcdtools and htslib to match this as they work in sync

sflodin commented 2 years ago

Progress done today:

 ##############################################
####                ClinSV                ####
##############################################
# 29/09/2021 08:05:41

# clinsv dir: /app/clinsv
# projectDir: /app/project_folder
# sampleInfoFile: /app/project_folder/sampleInfo.txt 
# name stem: project_folder
# lumpyBatchSize: 5
# genome reference: /app/ref-data
# run steps: all
# number input bams: 2

# Read Sample Info from /app/project_folder/sampleInfo.txt
# use: FR05812606    H7LH3CCXX_6        /app/input/NA12878.grch38.subsampled.bam
# use: NA12878    SRR098401    Solexa-51024    /app/input/NA12878.mapped.illumina.mosaik.CEU.exome.20110411.bam
# 2 samples to process
# If not, please exit make a copy of sampleInfo.txt, modify it and rerun with -s sampleInfo_mod.txt pointing to the new sample info file. 

###### Generate the commands and scripts ######

# bigwig

# lumpy

# cnvnator

# annotate

# prioritize

# qc

###### Run jobs ######

 ### executing: sh /app/project_folder/alignments/FR05812606/bw/sh/bigwig.createWigs.FR05812606.sh &> /app/project_folder/alignments/FR05812606/bw/sh/bigwig.createWigs.FR05812606.e  ...  

 ### finished after (hh:mm:ss): 00:00:00
 ### exist status: 512

 ***** error exist status != 0 (512), please check /app/project_folder/alignments/FR05812606/bw/sh/bigwig.createWigs.FR05812606.e for more information 

A quick cat of the error file revealed:

+ export PATH=/app/clinsv/bin:/app/clinsv/bin:/app/clinsv/root/bin:/bin/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
+ PATH=/app/clinsv/bin:/app/clinsv/bin:/app/clinsv/root/bin:/bin/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
+ cd /app/project_folder/alignments/FR05812606/bw/
+ awk '$2>=100000 {​​​​​​​print $1":1-"$2}​​​​​​​' /app/ref-data/genome/Homo_sapiens_assembly38.fasta.chrom.sizes
+ xargs -P 15 -t '-i{​​​​​​​}​​​​​​​' perl /app/clinsv/clinSV/scripts/bam2wig.pl -s 1 -q q0 -r '{​​​​​​​}​​​​​​​' -o /app/project_folder/alignments/FR05812606/bw/tmp/FR05812606.q0 -f /app/ref-data/genome/Homo_sapiens_assembly38.fasta -b /app/project_folder/alignments/FR05812606/FR05812606.bam
awk: fatal: cannot open file `/app/ref-data/genome/Homo_sapiens_assembly38.fasta.chrom.sizes' for reading (No such file or directory)

This should be an easy fix, when we build the docker container again.

Next Goal:

Possible hurdles:

J-Bradlee commented 2 years ago

Progress Done Today:

This seemed to fix the "refdata-bX not matching"  error we having before. However we are still getting the same error described in the last post.

We then tried running that command within the docker container for the v0.9. And we get a similar error:
Note that we are able to get v0.9 running when we execute ClinSV from outside it's docker container. We are just not able to get it to work whilst inside the container.

This makes us think we are passing in the -ref and -p commands incorrectly into ClinSv itself. Or we have some path issues.

- We noticed that the scripts that ClinSV generates contains a cd command into the alignment folder, yet we don't see why that is necessary as all the commands in that script uses an absolute file path name (from the app directory).

set -e -x -o pipefail

export PATH=/app/clinsv/bin:$PATH

cd /app/project_folder/alignments/FR05812606/bw/

sort -k2,2nr /app/ref-data/refdata-b38/genome/human_g1k_v37_decoy.fasta.fai | awk '{​​​​​print $1":1-"$2}​​​​​' | xargs -P 15 -t -i{​​​​​}​​​​​ perl /app/clinsv/clinSV/scripts/bam2wig.pl 1 q0 {​​​​​}​​​​​ /app/project_folder/alignments/FR05812606/bw/tmp/FR05812606.q0 /app/ref-data/refdata-b38/genome/human_g1k_v37_decoy.fasta /app/project_folder/alignments/FR05812606/FR05812606.bam cat /app/project_folder/alignments/FR05812606/bw/tmp/FR05812606.q0.*.wig > /app/project_folder/alignments/FR05812606/bw/tmp/FR05812606.q0.wig

rm /app/project_folder/alignments/FR05812606/bw/tmp/FR05812606.q0.*.wig

sort -k2,2nr /app/ref-data/refdata-b38/genome/human_g1k_v37_decoy.fasta.fai | awk '{​​​​​print $1":1-"$2}​​​​​' | xargs -P 15 -t -i{​​​​​}​​​​​ perl /app/clinsv/clinSV/scripts/bam2wig.pl 1 q20 {​​​​​}​​​​​ /app/project_folder/alignments/FR05812606/bw/tmp/FR05812606.q20 /app/ref-data/refdata-b38/genome/human_g1k_v37_decoy.fasta /app/project_folder/alignments/FR05812606/FR05812606.bam cat /app/project_folder/alignments/FR05812606/bw/tmp/FR05812606.q20.*.wig > /app/project_folder/alignments/FR05812606/bw/tmp/FR05812606.q20.wig

rm /app/project_folder/alignments/FR05812606/bw/tmp/FR05812606.q20.*.wig

sort -k2,2nr /app/ref-data/refdata-b38/genome/human_g1k_v37_decoy.fasta.fai | awk '{​​​​​print $1":1-"$2}​​​​​' | xargs -P 15 -t -i{​​​​​}​​​​​ perl /app/clinsv/clinSV/scripts/bam2wigMQ.pl 1 {​​​​​}​​​​​ /app/project_folder/alignments/FR05812606/bw/tmp/FR05812606.mq /app/ref-data/refdata-b38/genome/human_g1k_v37_decoy.fasta /app/project_folder/alignments/FR05812606/FR05812606.bam cat /app/project_folder/alignments/FR05812606/bw/tmp/FR05812606.mq.*.wig > /app/project_folder/alignments/FR05812606/bw/tmp/FR05812606.mq.wig

rm /app/project_folder/alignments/FR05812606/bw/tmp/FR05812606.mq.*.wig


We have a feeling that the "cd" command is causing the script to be unable to locate the ref data + other file paths and thus fail and give us the described error message in the previous dot point.

We have located that the script is being generated between line 304-340 of clinsv.pl, as show below. Note the command
`print OUT "cd $r_OutDir/\n";` , not sure why this is necessary.. 
foreach $cSample (sort keys %fastq){

    $r_OutDir="$projectDir/alignments/$cSample/bw";
    $rAlnDir="$projectDir/alignments/$cSample";
    $r_TmpDir="$r_OutDir/tmp";

    $r_JobSubType="createWigs"; #! edit here
    $r_JobUnit="$cSample"; #! edit here $cSample or joined
    $cJ=setUpJ($r_JobType,$r_JobSubType,$r_JobUnit,$r_OutDir,$r_TmpDir);
    $$cJ{rDependencies}=[];
    $$cJ{rJobResource}="walltime=6:00:00,ncpus=16,mem=10GB"; #! edit here

    open(OUT,">$$cJ{rShStem}.sh");  
    print OUT "$r_head\n\n";    
    print OUT "cd $r_OutDir/\n";

    foreach $cT (("q0","q20","mq")){ # create bw of s1 for MQ>=0 and MQ>=20
        if($cT eq "mq"){
            print OUT "\n\nawk '\$2>=100000 {print \$1\":1-\"\$2}' $refFasta.chrom.sizes | xargs -P 15 -t -i{} perl $S_SV_scriptDir/bam2wigMQ.pl ";
            print OUT "-s 1 -r \"{}\" -o $r_TmpDir/$cSample.$cT -f $refFasta -b $rAlnDir/$cSample.bam\n";

            print OUT "\n\nawk '\$2<100000 {print \$1\":1-\"\$2}' $refFasta.chrom.sizes | xargs -P 1 -t -i{} perl $S_SV_scriptDir/bam2wigMQ.pl ";
            print OUT "-s 1 -r \"{}\" -o $r_TmpDir/$cSample.$cT.small_contigs -f $refFasta -b $rAlnDir/$cSample.bam -a \n";

        }else{
            print OUT "\n\nawk '\$2>=100000 {print \$1\":1-\"\$2}' $refFasta.chrom.sizes | xargs -P 15 -t -i{} perl $S_SV_scriptDir/bam2wig.pl ";
            print OUT "-s 1 -q $cT -r \"{}\" -o $r_TmpDir/$cSample.$cT -f $refFasta -b $rAlnDir/$cSample.bam\n";

            print OUT "\n\nawk '\$2<100000 {print \$1\":1-\"\$2}' $refFasta.chrom.sizes | xargs -P 1 -t -i{} perl $S_SV_scriptDir/bam2wig.pl ";
            print OUT "-s 1 -q $cT -r \"{}\" -o $r_TmpDir/$cSample.$cT.small_contigs -f $refFasta -b $rAlnDir/$cSample.bam -a\n";
        }   
        print OUT "cat $r_TmpDir/$cSample.$cT.*.wig > $r_TmpDir/$cSample.$cT.wig\n\n";
            print OUT "rm $r_TmpDir/$cSample.$cT.*.wig\n";
        }
    close(OUT);

}


- We also attempted to have the re installation script for samtools we created to be built into our v1.0 docker container, however we were getting some md5 hash errors installing previously installed libraries. (This could be due to my internet, as I am currently working on this on my mobile data rather than my usual home internet). However, if this were to build successfully we might not need to run clinsv within the container and try run ClinSV outside the docker container, as we done with our successful run with v0.9.

**Next Steps**
- Get our v1.0 image built with the new samtools install script. So we can try and run it exactly how we got v0.9 to work.
- Try and edit out the cd command in the script generations above, and see if that changes anything 

***Possible hurdles***
- We might be passing in the -ref and -p commands incorrectly when inside the docker containers, need to figure out how to do it properly.
J-Bradlee commented 2 years ago

Progress Done Today

The directory (called "test" in our case) clinSV v1.0 docker container should look like as follows:

test
|   *.bam
|   *.bai
└───clinsv
|    |
|    └───refdata-b38
|    |     |
|    |     └───refdata-b38
|    |     |       | /all of refdata-b38's content/ 
|    
└───test_run

With the environment variables in the host machine defined as:

refdata_path=$PWD/clinsv/refdata-b38
input_path=$PWD
project_folder=$PWD/test_run

And the Docker Command as:

  docker run \
  -v $refdata_path:/app/ref-data \
  -v $project_folder:/app/project_folder \
  -v $input_path:/app/input \
    DOCKER_CONTAINER_NAME -r all \
  -i "/app/input/*.bam" \
  -ref $refdata_path:/app/ref-data/refdata-b38 \
  -p $project_folder:/app/project_folder

Where DOCKER_CONTAINER_NAME is the pulled/built docker container from our v1.0 dockerfile. Also notice that the -ref command is slightly different, as we added "/refdata-b38" to the end of it.

Next Steps

Possible Hurdles

J-Bradlee commented 2 years ago

Progress Done Today

Got our docker file to run on a google cloud VM with 16 vCPUs and 64gb of ram. It takes roughly 1 and a half hours for the first job to be done for the sub sampled bam. Then roughly 30 minutes each for each script created to run.

I did run into this error about 5 hours in:

+ export PATH=/app/clinsv/bin:/app/clinsv/bin:/app/clinsv/root/bin:/bin/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
+ PATH=/app/clinsv/bin:/app/clinsv/bin:/app/clinsv/root/bin:/bin/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
+ declare -A meanArr
+ declare -A stdevArr
+ declare -A readLArr
++ samtools view /app/project_folder/alignments/FR05812606/FR05812606.bam chr1:1000000-1100000
++ cut -f 10
++ awk '{ print length}'
++ sort -rn
++ awk '(NR==1){print}'
[W::hts_idx_load3] The index file is older than the data file: /app/project_folder/alignments/FR05812606/FR05812606.bam.bai
+ readLArr[H7LH3CCXX_6]=150
+ read -r mean stdev
++ samtools view -r H7LH3CCXX_6 /app/project_folder/alignments/FR05812606/FR05812606.bam
++ python /app/clinsv/clinSV/scripts/pairend_distro-a1.py -r 150 -X 2 -N 100000 -o /app/project_folder/SVs/FR05812606/lumpy/H7LH3CCXX_6.pe.histo
++ cut -d : -f 2
ERROR:root:code for hash md5 was not found.
Traceback (most recent call last):
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 147, in <module>
    globals()[__func_name] = __get_hash(__func_name)
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 97, in __get_builtin_constructor
    raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type md5
ERROR:root:code for hash sha1 was not found.
Traceback (most recent call last):
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 147, in <module>
    globals()[__func_name] = __get_hash(__func_name)
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 97, in __get_builtin_constructor
    raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type sha1
ERROR:root:code for hash sha224 was not found.
Traceback (most recent call last):
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 147, in <module>
    globals()[__func_name] = __get_hash(__func_name)
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 97, in __get_builtin_constructor
    raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type sha224
ERROR:root:code for hash sha256 was not found.
Traceback (most recent call last):
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 147, in <module>
    globals()[__func_name] = __get_hash(__func_name)
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 97, in __get_builtin_constructor
    raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type sha256
ERROR:root:code for hash sha384 was not found.
Traceback (most recent call last):
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 147, in <module>
    globals()[__func_name] = __get_hash(__func_name)
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 97, in __get_builtin_constructor
    raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type sha384
ERROR:root:code for hash sha512 was not found.
Traceback (most recent call last):
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 147, in <module>
    globals()[__func_name] = __get_hash(__func_name)
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 97, in __get_builtin_constructor
    raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type sha512
Removed 0 outliers with isize >= 1126
+ meanArr[H7LH3CCXX_6]=430
+ stdevArr[H7LH3CCXX_6]=111
++ samtools view /app/project_folder/alignments/NA12878/NA12878.bam chr1:1000000-1100000
++ cut -f 10
++ awk '{ print length}'
++ sort -rn
++ awk '(NR==1){print}'
[W::bam_hdr_read] EOF marker is absent. The input is probably truncated
[W::hts_idx_load3] The index file is older than the data file: /app/project_folder/alignments/NA12878/NA12878.bam.bai
[main_samview] region "chr1:1000000-1100000" specifies an invalid region or unknown reference. Continue anyway.
+ readLArr[SRR098401]=
+ read -r mean stdev
++ samtools view -r SRR098401 /app/project_folder/alignments/NA12878/NA12878.bam
++ python /app/clinsv/clinSV/scripts/pairend_distro-a1.py -r -X 2 -N 100000 -o /app/project_folder/SVs/NA12878/lumpy/SRR098401.pe.histo
++ cut -d : -f 2
[W::bam_hdr_read] EOF marker is absent. The input is probably truncated
ERROR:root:code for hash md5 was not found.
Traceback (most recent call last):
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 147, in <module>
    globals()[__func_name] = __get_hash(__func_name)
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 97, in __get_builtin_constructor
    raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type md5
ERROR:root:code for hash sha1 was not found.
Traceback (most recent call last):
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 147, in <module>
    globals()[__func_name] = __get_hash(__func_name)
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 97, in __get_builtin_constructor
    raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type sha1
ERROR:root:code for hash sha224 was not found.
Traceback (most recent call last):
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 147, in <module>
    globals()[__func_name] = __get_hash(__func_name)
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 97, in __get_builtin_constructor
    raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type sha224
ERROR:root:code for hash sha256 was not found.
Traceback (most recent call last):
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 147, in <module>
    globals()[__func_name] = __get_hash(__func_name)
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 97, in __get_builtin_constructor
    raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type sha256
ERROR:root:code for hash sha384 was not found.
Traceback (most recent call last):
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 147, in <module>
    globals()[__func_name] = __get_hash(__func_name)
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 97, in __get_builtin_constructor
    raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type sha384
ERROR:root:code for hash sha512 was not found.
Traceback (most recent call last):
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 147, in <module>
    globals()[__func_name] = __get_hash(__func_name)
  File "/app/clinsv/python/lib/python2.7/hashlib.py", line 97, in __get_builtin_constructor
    raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type sha512
Usage: pairend_distro-a1.py [options]

pairend_distro-a1.py: error: option -r: invalid integer value: '-X'
+ meanArr[SRR098401]=
+ stdevArr[SRR098401]=
+ cd /app/project_folder/SVs/joined/lumpy
++ expr 150 - 30
++ expr - 30
expr: syntax error: unexpected argument '30'
+ lumpy -mw 3 -tt 0 -x /app/ref-data/refdata-b38/exclude.cnvnator_100bp.GRCh38.20170403.bed -pe id:FR05812606,bam_file:/app/project_folder/SVs/FR05812606/lumpy/FR05812606.discordants.bam,read_group:H7LH3CCXX_6,histo_file:/app/project_folder/SVs/FR05812606/lumpy/H7LH3CCXX_6.pe.histo,mean:430,stdev:111,read_length:150,min_non_overlap:120,discordant_z:3,back_distance:10,weight:1,min_mapping_threshold:20 -sr id:FR05812606,bam_file:/app/project_folder/SVs/FR05812606/lumpy/FR05812606.splitters.f.bam,back_distance:10,weight:2,min_mapping_threshold:20 -pe id:NA12878,bam_file:/app/project_folder/SVs/NA12878/lumpy/NA12878.discordants.bam,read_group:SRR098401,histo_file:/app/project_folder/SVs/NA12878/lumpy/SRR098401.pe.histo,mean:,stdev:,read_length:,min_non_overlap:,discordant_z:3,back_distance:10,weight:1,min_mapping_threshold:20 -sr id:NA12878,bam_file:/app/project_folder/SVs/NA12878/lumpy/NA12878.splitters.f.bam,back_distance:10,weight:2,min_mapping_threshold:20
Parameter required for mean

Program: ********** (v 0.2.13)
Author:  Ryan Layer (rl6sf@virginia.edu)
Summary: Find structural variations in various signals.

Usage:   ********** [OPTIONS] 

Options: 
        -g      Genome file (defines chromosome order)
        -e      Show evidence for each call
        -w      File read windows size (default 1000000)
        -mw     minimum weight for a call
        -msw    minimum per-sample weight for a call
        -tt     trim threshold
        -x      exclude file bed file
        -t      temp file prefix, must be to a writeable directory
        -P      output probability curve for each variant
        -b      output BEDPE instead of VCF
        -sr     bam_file:<file name>,
                id:<sample name>,
                back_distance:<distance>,
                min_mapping_threshold:<mapping quality>,
                weight:<sample weight>,
                min_clip:<minimum clip length>,
                read_group:<string>

        -pe     bam_file:<file name>,
                id:<sample name>,
                histo_file:<file name>,
                mean:<value>,
                stdev:<value>,
                read_length:<length>,
                min_non_overlap:<length>,
                discordant_z:<z value>,
                back_distance:<distance>,
                min_mapping_threshold:<mapping quality>,
                weight:<sample weight>,
                read_group:<string>

        -bedpe  bedpe_file:<bedpe file>,
                id:<sample name>,
                weight:<sample weight>

Looks like the start of error is at

python /app/clinsv/clinSV/scripts/pairend_distro-a1.py -r 150 -X 2 -N 100000 -o /app/project_folder/SVs/FR05812606/lumpy/H7LH3CCXX_6.pe.histo

Running that command does gives the md5 hash error as above.

Quick google around made me believe this is an openssl issue, that happens when you use an older version of python (v2.7) which ended its support in 2020, with a newer version of openssl (which on centOS 8 in the docker file is OpenSSL 1.1.1k FIPS 25 Mar 2021 ).

Therefore I installed python3 and ran "pairend_distro-a1.py" with python3 I no longer had any errors. I did the following:

samtools view -r H7LH3CCXX_6 /app/project_folder/alignments/FR05812606/FR05812606.bam | python3 /app/clinsv/clinSV/scripts/pairend_distro-a1.py -r 150 -X 2 -N 100000 -o /app/project_folder/SVs/FR05812606/lumpy/H7LH3CCXX_6.pe.histo | cut -d : -f 2

Got output:

Removed 0 outliers with isize >= 1126
430 111

Alternatively, I tried to downgrade to openssl v.1.0.0 from 2010, which I did my installing the binaries following this guide. However, running the pairend script still ran into the same error.

Next Steps:

Possible Hurdles:

drmjc commented 2 years ago

excellent progress @J-Bradlee. I agree with switching to python3 & that was on the roadmap already. Downgrading dependencies will just cause problems later...

J-Bradlee commented 2 years ago

Strangely enough changing the all other python 2 commands leads to another crash. Changing just the command mention above to python3 works and allows for the "lumpy" section of the script to continue. Did run in to a new dependency issue with big wig being used in clinSV/scripts/add-depth-to-PE-SR-calls.pl , it is unable to load the module despite it being there. I'm going to figure out how to reinstall bigwig for perl and see if that does the trick. I'll work on it over the weekend.

J-Bradlee commented 2 years ago

Progress Done Today Decided to create a new docker file based on Ubuntu 20.04 (LTS). Reason being, we felt that there will just be more and more dependency issues down the line. As a lot of programs in the precompiled dependencies are deprecated. So we decided to install all the latest key dependencies from the install md to a fresh Ubuntu os.

Specific things we had to do:

Carrying over from the steps previously done by the last docker image, we were able to finish running clinsv without any errors. We get the following output:

##############################################
####                ClinSV                ####
##############################################
# 22/11/2021 20:12:57

# clinsv dir: /app/clinsv
# projectDir: /app/project_folder
# sampleInfoFile: /app/project_folder/sampleInfo.txt 
# name stem: project_folder
# lumpyBatchSize: 5
# genome reference: /app/ref-data/refdata-b38
# run steps: all
# number input bams: 1

# Read Sample Info from /app/project_folder/sampleInfo.txt
# use: FR05812606       H7LH3CCXX_6             /app/input/NA12878.grch38.subsampled.bam
# 1 samples to process
# If not, please exit make a copy of sampleInfo.txt, modify it and rerun with -s sampleInfo_mod.txt pointing to the new sample info file. 

###### Generate the commands and scripts ######

# bigwig

# lumpy

# cnvnator

# annotate

# prioritize

# qc

###### Run jobs ######

# OK # job bigwig, createWigs, FR05812606 ran successfully (exit=0) -> skip : /app/project_folder/alignments/FR05812606/bw/sh/bigwig.createWigs.FR05812606.sh

# OK # job bigwig, q0, FR05812606 ran successfully (exit=0) -> skip : /app/project_folder/alignments/FR05812606/bw/sh/bigwig.q0.FR05812606.sh

# OK # job bigwig, q20, FR05812606 ran successfully (exit=0) -> skip : /app/project_folder/alignments/FR05812606/bw/sh/bigwig.q20.FR05812606.sh

# OK # job bigwig, mq, FR05812606 ran successfully (exit=0) -> skip : /app/project_folder/alignments/FR05812606/bw/sh/bigwig.mq.FR05812606.sh

# OK # job lumpy, preproc, FR05812606 ran successfully (exit=0) -> skip : /app/project_folder/SVs/FR05812606/lumpy/sh/lumpy.preproc.FR05812606.sh

# OK # job lumpy, caller, joined ran successfully (exit=0) -> skip : /app/project_folder/SVs/joined/lumpy/sh/lumpy.caller.joined.sh

# OK # job lumpy, depth, joined ran successfully (exit=0) -> skip : /app/project_folder/SVs/joined/lumpy/sh/lumpy.depth.joined.sh

# OK # job cnvnator, caller, FR05812606 ran successfully (exit=0) -> skip : /app/project_folder/SVs/FR05812606/cnvnator/sh/cnvnator.caller.FR05812606.sh

# OK # job annotate, main, joined ran successfully (exit=0) -> skip : /app/project_folder/SVs/joined/sh/annotate.main.joined.sh

# OK # job prioritize, main, joined ran successfully (exit=0) -> skip : /app/project_folder/SVs/joined/sh/prioritize.main.joined.sh

# OK # job qc, main, joined ran successfully (exit=0) -> skip : /app/project_folder/SVs/qc/sh/qc.main.joined.sh

# 22/11/2021 20:12:57 Project project_folder project_folder | Total jobs 11 | Remaining jobs 0 | Remaining steps   0 | Total time: 231 min

# Everything done! Exit

# writing igv session files...

lumpy variants not present: /app/project_folder/SVs/joined/SV-CNV.FR05812606.igv.bed.gz
xml file: /app/project_folder/igv/FR05812606.xml

I assume since we are using a subsampled BAM that is the reason no report was made, and is the reason for the lumpy variants not present: /app/project_folder/SVs/joined/SV-CNV.FR05812606.igv.bed.gz output.

@drmjc Could you please confirm is this is desired output? Or at least it makes sense?

Next Steps

Possible Hurdles

drmjc commented 2 years ago

Excellent progress @J-Bradlee, i totally agree with using the most recent Ubuntu LTS.

What files are in the results folder? When you say no report was made, do you mean these three expected outputs are not made:

If the first 2 files are missing, then I suspect that's a bug/feature that SV-CNV.vcf may not get created. Can you please run a full BAM file through ClinSV to assess this?

J-Bradlee commented 2 years ago

I get none of those 3 outputs. The last thing that gets created is /app/project_folder/igv/FR05812606.xml . I will run a full BAM file through ClinSV, to check it all out.

drmjc commented 2 years ago

I've updated the 'apps' to run the clinsv pipeline on DNAnexus, and ran it successfully on a patient sample overnight.

I'll run the subset BAM file today and see what results are generated by the pipeline.

FWIW, in the DNAnexus implementation, it's the 'igv' creation step that writes out the txt and xlsx result files. I'm still digging into how much this is due to the DNAnexus implementation, vs the raw clinsv implementation...

drmjc commented 2 years ago

have you had much success with the 2to3 script? It fixes some obvious things, but is far from perfect...

On Fri, 19 Nov 2021 at 11:23, James Bradley @.***> wrote:

Strangely enough changing the all other python 2 commands leads to another crash. Changing just the command mention above to python3 works and allows for the "lumpy" section of the script to continue. Did run in to a new dependency issue with big wig being used in clinSV/scripts/add-depth-to-PE-SR-calls.pl , it is unable to load the module despite it being there. I'm going to figure out how to reinstall bigwig for perl and see if that does the trick. I'll work on it over the weekend.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/KCCG/ClinSV/issues/13#issuecomment-973557541, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEQQM7KOIWQ2GWKA7GRCPLUMWKIFANCNFSM5DZE3QRA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

sflodin commented 2 years ago

The root cause of the previous dependency issues was due to the some of the dependencies being installed in the wrong place. We also added the incorrect directory to $PATH which caused issues when linking the dependencies.

Progress Done Today Solved the problem where setting the environment variable in the dockerfile to "clinsv_path = $PWD/clinsv" would change depending on the $PWD. This was solved by setting clinsv_path to simply "/clinsv".

Began solving the issue of dependencies being placed in the wrong directories. Through a bit of experimentation we were able to create a system that made all the symbolic links work however have not transferred that to the docker yet - at the moment this seems doable just ran out of time for the day.

Next step: Fix up the dockerfile so that all directories are set up correctly then make sure that all the symbolic links work on that fle structure. Overnight run v0.9 to check output. This will happen tonight, will post what the results of this are

J-Bradlee commented 2 years ago

Progress Done Today

Just some other things to note:

Next steps

Possible hurdles

J-Bradlee commented 2 years ago

Currently running into this error during the lumpy step @/app/project_folder/SVs/joined/lumpy/sh/lumpy.depth.joined.e:


+ export PATH=/app/clinsv/bin:/app/clinsv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
+ PATH=/app/clinsv/bin:/app/clinsv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
+ export PERL5LIB=/app/clinsv/perlib:
+ PERL5LIB=/app/clinsv/perlib:
+ perl -e 'while(<>){ if(/^#/){$h.=$_; next;} @_=split("\t",$_); $c=$_[0]; if(!exists($f{$c})){ open($f{$c},">/app/
project_folder/SVs/joined/lumpy/tmp/in.$c.vcf"); print {$f{$c}} $h; } 
print {$f{$c}} $_; } foreach (values %f){close} ' /app/project_folder/SVs/joined/lumpy/project_folder.MQ20.OpreProc
.vcf
+ ls /app/project_folder/SVs/joined/lumpy/tmp/in.chr1.vcf /app/project_folder/SVs/joined/lumpy/tmp/in.chr10.vcf /ap
p/project_folder/SVs/joined/lumpy/tmp/in.chr11.vcf /app/project_folder/SVs/joined/lumpy/tmp/in.chr12.vcf /app/proje
ct_folder/SVs/joined/lumpy/tmp/in.chr13.vcf /app/project_folder/SVs/joined/lumpy/tmp/in.chr14.vcf /app/project_fold
er/SVs/joined/lumpy/tmp/in.chr15.vcf /app/project_folder/SVs/joined/lumpy/tmp/in.chr16.vcf /app/project_folder/SVs/
joined/lumpy/tmp/in.chr17.vcf /app/project_folder/SVs/joined/lumpy/tmp/in.chr18.vcf /app/project_folder/SVs/joined/
lumpy/tmp/in.chr19.vcf /app/project_folder/SVs/joined/lumpy/tmp/in.chr2.vcf /app/project_folder/SVs/joined/lumpy/tm
p/in.chr20.vcf /app/project_folder/SVs/joined/lumpy/tmp/in.chr21.vcf /app/project_folder/SVs/joined/lumpy/tmp/in.ch
r22.vcf /app/project_folder/SVs/joined/lumpy/tmp/in.chr3.vcf /app/project_folder/SVs/joined/lumpy/tmp/in.chr4.vcf /
app/project_folder/SVs/joined/lumpy/tmp/in.chr5.vcf /app/project_folder/SVs/joined/lumpy/tmp/in.chr6.vcf /app/proje
ct_folder/SVs/joined/lumpy/tmp/in.chr7.vcf /app/project_folder/SVs/joined/lumpy/tmp/in.chr8.vcf /app/project_folder
/SVs/joined/lumpy/tmp/in.chr9.vcf /app/project_folder/SVs/joined/lumpy/tmp/in.chrX.vcf /app/project_folder/SVs/join
ed/lumpy/tmp/in.chrY.vcf
+ xargs -P 12 -t '-i{}' perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl '{}' /app/ref-data/refdata-b38/
genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control/brkp 500 /app/ref-data/r
efdata-b38/control/cnv/bw
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chr1.vcf /a
pp/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control/
brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chr10.vcf /
app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control
/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chr11.vcf /
app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control
/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chr12.vcf /
app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control
/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chr13.vcf /app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chr14.vcf /app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chr15.vcf /app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chr16.vcf /app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chr17.vcf /app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chr18.vcf /app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chr19.vcf /app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chr2.vcf /app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr10.vcf
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr10.out
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
open in the input VCF...
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr13.vcf
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr13.out
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
open in the input VCF...
samples in vcf: FR05812606
open the bw files...
samples in vcf: FR05812606
open the bw files...
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr18.vcf
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr18.out
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr1.vcf
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr1.out
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
open in the input VCF...
open in the input VCF...
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr19.vcf
# running add-depth-to-PE-SR-calls-v4.pl 
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr19.out
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr15.vcf
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr15.out
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
open in the input VCF...
open in the input VCF...
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr2.vcf
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr2.out
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
open in the input VCF...
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr16.vcf
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr16.out
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
open in the input VCF...
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr12.vcf
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr12.out
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr17.vcf
open in the input VCF...
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr17.out
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr14.vcf
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr14.out
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
open in the input VCF...
open in the input VCF...
samples in vcf: FR05812606
open the bw files...
samples in vcf: FR05812606
open the bw files...
samples in vcf: FR05812606
open the bw files...
samples in vcf: FR05812606
open the bw files...
samples in vcf: FR05812606
open the bw files...
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr11.vcf
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr11.out
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
open in the input VCF...
samples in vcf: FR05812606
open the bw files...
samples in vcf: FR05812606
open the bw files...
samples in vcf: FR05812606
open the bw files...
samples in vcf: FR05812606
open the bw files...
samples in vcf: FR05812606
open the bw files...
# sample: FR05812606, bw found
# sample: FR05812606, bw found
# sample: FR05812606, bw found
# sample: FR05812606, bw found
# sample: FR05812606, bw found
# sample: FR05812606, bw found
# sample: FR05812606, bw found
# sample: FR05812606, bw found
# sample: FR05812606, bw found
# sample: FR05812606, bw found
# sample: FR05812606, bw found
# sample: FR05812606, bw found
determine the average coverage per sample acrosse >MQ50 regions...
determine the average coverage per sample acrosse >MQ50 regions...
determine the average coverage per sample acrosse >MQ50 regions...
determine the average coverage per sample acrosse >MQ50 regions...
determine the average coverage per sample acrosse >MQ50 regions...
determine the average coverage per sample acrosse >MQ50 regions...
determine the average coverage per sample acrosse >MQ50 regions...
determine the average coverage per sample acrosse >MQ50 regions...
determine the average coverage per sample acrosse >MQ50 regions...
determine the average coverage per sample acrosse >MQ50 regions...
determine the average coverage per sample acrosse >MQ50 regions...
determine the average coverage per sample acrosse >MQ50 regions...
Y:0.00500
Y:0.00500
Y:0.00500
Y:0.00500
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
Y:0.00500
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
Y:0.00500
Y:0.00500
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
Y:0.00500
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
Y:0.00500
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
Y:0.00500
Y:0.00500
Y:0.00500
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chr20.vcf /app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr20.vcf
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr20.out
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
open in the input VCF...
samples in vcf: FR05812606
open the bw files...
# sample: FR05812606, bw found
determine the average coverage per sample acrosse >MQ50 regions...
Y:0.00500
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chr21.vcf /app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chr22.vcf /app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr21.vcf
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr21.out
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
open in the input VCF...
samples in vcf: FR05812606
open the bw files...
# sample: FR05812606, bw found
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr22.vcf
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr22.out
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
open in the input VCF...
samples in vcf: FR05812606
open the bw files...
determine the average coverage per sample acrosse >MQ50 regions...
# sample: FR05812606, bw found
determine the average coverage per sample acrosse >MQ50 regions...
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chr3.vcf /app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr3.vcf
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr3.out
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
open in the input VCF...
samples in vcf: FR05812606
open the bw files...
# sample: FR05812606, bw found
determine the average coverage per sample acrosse >MQ50 regions...
Y:0.00500
Y:0.00500
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chr4.vcf /app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr4.vcf
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr4.out
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
open in the input VCF...
samples in vcf: FR05812606
open the bw files...
# sample: FR05812606, bw found
Y:0.00500
determine the average coverage per sample acrosse >MQ50 regions...
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
Y:0.00500
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chr5.vcf /app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr5.vcf
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr5.out
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
open in the input VCF...
samples in vcf: FR05812606
open the bw files...
# sample: FR05812606, bw found
determine the average coverage per sample acrosse >MQ50 regions...
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chr6.vcf /app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr6.vcf
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr6.out
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
open in the input VCF...
samples in vcf: FR05812606
open the bw files...
# sample: FR05812606, bw found
determine the average coverage per sample acrosse >MQ50 regions...
Y:0.00500
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
Y:0.00500
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chr7.vcf /app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr7.vcf
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr7.out
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
open in the input VCF...
samples in vcf: FR05812606
open the bw files...
# sample: FR05812606, bw found
determine the average coverage per sample acrosse >MQ50 regions...
Y:0.00500
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chr8.vcf /app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr8.vcf
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr8.out
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
open in the input VCF...
samples in vcf: FR05812606
open the bw files...
# sample: FR05812606, bw found
determine the average coverage per sample acrosse >MQ50 regions...
Y:0.00500
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chr9.vcf /app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr9.vcf
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chr9.out
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
open in the input VCF...
samples in vcf: FR05812606
open the bw files...
# sample: FR05812606, bw found
determine the average coverage per sample acrosse >MQ50 regions...
Y:0.00500
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chrX.vcf /app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chrX.vcf
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chrX.out
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
open in the input VCF...
samples in vcf: FR05812606
open the bw files...
# sample: FR05812606, bw found
determine the average coverage per sample acrosse >MQ50 regions...
perl /app/clinsv/clinSV/scripts/add-depth-to-PE-SR-calls.pl /app/project_folder/SVs/joined/lumpy/tmp/in.chrY.vcf /app/ref-data/refdata-b38/genome/Homo_sapiens_assembly38.fasta /app/project_folder /app/ref-data/refdata-b38/control/brkp 500 /app/ref-data/refdata-b38/control/cnv/bw 
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
# running add-depth-to-PE-SR-calls-v4.pl 
# in VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chrY.vcf
# out VCF: /app/project_folder/SVs/joined/lumpy/tmp/in.chrY.out
# in SR control: /app/ref-data/refdata-b38/control/brkp/SR.brkp.gz
# in PE control: /app/ref-data/refdata-b38/control/brkp/PE.brkp.gz
create tabix vcf to extract the copy neutral deletion duplications...
open in the input VCF...
samples in vcf: FR05812606
open the bw files...
# sample: FR05812606, bw found
determine the average coverage per sample acrosse >MQ50 regions...
Y:0.00500
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
Y:0.00500
average coverage FR05812606 autosome:1.92, X:1.92, Y:0, MT:692.49
parse VCF...
+ grep '#' /app/project_folder/SVs/joined/lumpy/tmp/in.chr1.out
+ grep -v '#'
+ sort -V -k1,1 -k2,2n
+ cat /app/project_folder/SVs/joined/lumpy/tmp/in.chr1.out /app/project_folder/SVs/joined/lumpy/tmp/in.chr10.out /app/project_folder/SVs/joined/lumpy/tmp/in.chr11.out /app/project_folder/SVs/joined/lumpy/tmp/in.chr12.out /app/project_folder/SVs/joined/lumpy/tmp/in.chr13.out /app/project_folder/SVs/joined/lumpy/tmp/in.chr14.out /app/project_folder/SVs/joined/lumpy/tmp/in.chr15.out /app/project_folder/SVs/joined/lumpy/tmp/in.chr16.out /app/project_folder/SVs/joined/lumpy/tmp/in.chr17.out /app/project_folder/SVs/joined/lumpy/tmp/in.chr18.out /app/project_folder/SVs/joined/lumpy/tmp/in.chr19.out /app/project_folder/SVs/joined/lumpy/tmp/in.chr2.out /app/project_folder/SVs/joined/lumpy/tmp/in.chr20.out /app/project_folder/SVs/joined/lumpy/tmp/in.chr21.out /app/project_folder/SVs/joined/lumpy/tmp/in.chr22.out /app/project_folder/SVs/joined/lumpy/tmp/in.chr3.out /app/project_folder/SVs/joined/lumpy/tmp/in.chr4.out /app/project_folder/SVs/joined/lumpy/tmp/in.chr5.out /app/project_folder/SVs/joined/lumpy/tmp/in.chr6.out /app/project_folder/SVs/joined/lumpy/tmp/in.chr7.out /app/project_folder/SVs/joined/lumpy/tmp/in.chr8.out /app/project_folder/SVs/joined/lumpy/tmp/in.chr9.out /app/project_folder/SVs/joined/lumpy/tmp/in.chrX.out /app/project_folder/SVs/joined/lumpy/tmp/in.chrY.out
+ perl /app/clinsv/clinSV/scripts/split_lumpy_vcf_by_sample.pl --infoFields SVTYPE,LEN --gtsFields SR,PE,DRF,IDD,CNRD,CNP --PASS --input /app/project_folder/SVs/joined/lumpy/project_folder.MQ20.OpreProc.f1.vcf --outStem /app/project_folder/SVs/joined/lumpy/project_folder.MQ20.OpreProc.f1
local perl lib: /app/clinsv/clinSV/scripts/../../perlib
samples: FR05812606
outStem: /app/project_folder/SVs/joined/lumpy/project_folder.MQ20.OpreProc.f1
+ inVCFS=/app/project_folder/SVs/joined/lumpy/project_folder.MQ20.OpreProc.f1.vcf
+ sort_bgzip /app/project_folder/SVs/joined/lumpy/project_folder.MQ20.OpreProc.f1.vcf
+ tabix -f -p vcf /app/project_folder/SVs/joined/lumpy/project_folder.MQ20.OpreProc.f1.vcf.gz
[E::hts_idx_push] Invalid record on sequence #1: end 16725245 < begin 234776441

The VCF files that generated the error messages:

project_folder.MQ20.OpreProc.f1.vcf.gz project_folder.MQ20.OpreProc.zip

Any idea on whats causing this @drmjc ? Its weird as I didn't seem to have this issue with the centOS 8 docker container previously. This is the ubuntu OS btw.

J-Bradlee commented 2 years ago

Progress done last few days

Next steps

Possible hurdles

Usage: ./cnvnator -root out.root [-genome name] [-chrom 1 2 ...] -tree file1.bam ... ./cnvnator -root out.root [-genome name] [-chrom 1 2 ...] -merge file1.root ... ./cnvnator -root file.root [-genome name] [-chrom 1 2 ...] [-d dir] -his bin_size ./cnvnator -root file.root [-chrom 1 2 ...] -stat bin_size ./cnvnator -root file.root -eval bin_size ./cnvnator -root file.root [-chrom 1 2 ...] -partition bin_size [-ngc] ./cnvnator -root file.root [-chrom 1 2 ...] -call bin_size [-ngc] ./cnvnator -root file.root -genotype bin_size [-ngc] ./cnvnator -root file.root -view bin_size [-ngc] ./cnvnator -pe file1.bam ... -qual val(20) -over val(0.8) [-f file]

Valid genomes (-genome option) are: NCBI36, hg18, GRCh37, hg19, mm9


the latest release of the speedseq library repo was 2017. We could try and use the most up to date CNVnator (which has GRch38 support) for the speedseq install, hopefully speedseq would still be compatible.

**Update 1**
The CNVnator step failed due to the outdated tabix, we think. just reinstalled all of samtools, bgzip and htslib with tabix to version 1.9. Hopefully that should fix it. We are currently running the step again.

**Update 2**
Still did not work,  BigWig perl module is returning an error about 

> Invalid chromosome; Cannot find chromosome name 'chr1_KI270707v1_random' in the big file at /app/clinsv/clinSV/scripts/filterCNVNator.pl line 348, <IN> line 10296.

Looking over the output file it seems only chr1 - chr22 and chrX, xhrY and chrM is being found by the program. Not sure if this is intended behavior, or how filterCNVNator.pl  is attempting to get this data...
Attached below is the output and script generated
[script.txt](https://github.com/KCCG/ClinSV/files/7868965/script.txt)
[output.txt](https://github.com/KCCG/ClinSV/files/7868966/output.txt)
drmjc commented 2 years ago

Thanks for persevering James.

i agree that using a v old tabix is a poor solution. i didn't think that it had changed that much over the years though, perhaps just the cli syntax?

shame speedseq doesn't support x38; I think this was chosen as a convenient way to get CNVnator, so this suggests that we may not need speedseq anymore.

It's completely ok that only chr1-22,X,T,MT are analysed, as these are the only chromosomes that we care about. Doing so shouldn't raise an error though... All the noise in the log file suggests we need a way to filter out superfluous contigs in a way that is flexible wrt the reference genome used. I'd suggest selecting chromosomes {1,2,...,22,X,Y,MT,chr1,chr2,...,chrX,chrY,chrM}, which should handle the two main naming schemes

J-Bradlee commented 2 years ago

i agree that using a v old tabix is a poor solution. i didn't think that it had changed that much over the years though, perhaps just the cli syntax?

There was an issue with the cli syntax with the old tabix, we have changed it to version 1.4 as in the original clinsv docker image. This works fine.

shame speedseq doesn't support x38; I think this was chosen as a convenient way to get CNVnator, so this suggests that we may not need speedseq anymore.

Perhaps it would be best if we just install CNVnator directly from this repo, however, it doesn't show on the ReadMe some command options to CNVnator that are being used by ClinSV such as '--threads' which is set to 14 and "-w" /"--window" which is set to 100.

It's completely ok that only chr1-22,X,T,MT are analysed, as these are the only chromosomes that we care about. Doing so shouldn't raise an error though... All the noise in the log file suggests we need a way to filter out superfluous contigs in a way that is flexible wrt the reference genome used

We just attempted to filter out all the superfluous contigs by adding the --exclude command to the cnvnator wrapper which ensured we only would process chr1 - chr22 and chrX, chrY and chrM. There is no T or MT found in the FR05812606.bam file, or at least cnvnator wrapper is not finding anything with its samtools view -H command. We are just waiting for the output now to see if that worked.

drmjc commented 2 years ago

ok - T was a typo. there's 22 autosomes, 2 allosomes/sex chromosomes (X,Y) and 1 mitochondrial genome (MT or chrM). with out without the chr prefix

J-Bradlee commented 2 years ago

Progress done today

Next Steps

Possible hurdles

Just a question has v1.0 ever been successfully run? Reason I ask this, is that we are wondering if errors like this have occurred before and if they didn't, then perhaps we have not set up ClinSV properly or there could be an issue with using Ubuntu 20.4 LTS.

J-Bradlee commented 2 years ago

Progress done today

Next steps

Possible Hurdles

drmjc commented 2 years ago

Damn good job

On Thu, 27 Jan 2022, 12:35 pm James Bradley, @.***> wrote:

Progress done today

  • Following from the last update, did run into an error with pdflatex not being installed, this was used to convert the report tex file to a pdf. Installing tex live easily fixed this up.
  • Did not run into anymore errors after that, and the job finished. It successfully outputted a report and igv session file. See pdf here: FR05812606.QC_report.pdf https://github.com/KCCG/ClinSV/files/7946629/FR05812606.QC_report.pdf . I have also attached the full project folder, where the igv session file can be found under the igv folder.
  • You can pull this current docker image with the command docker pull mrbradley2/clinsv:v2.6

Next steps

  • Do a complete rerun of the program, to make sure there are no errors that have been skipped over. Since we have not been re-running previously successful steps after we change anything.
  • The docker image is oversized, there is a lot of redundancy of programs installed (in wrong places) that we can get rid of. This should reduce the size by roughly half, from around 9gb to 4gb.
  • Once this is done, we will refactor the original docker file so it can directly build all the changes we have made from scratch. (not sure if this is too high priority as we can just used the image hosted in the docker repo, and commit any changes to there)
  • Upload all the code to this repo.
  • Start building out a CI/CD pipline connected to this repo that can automate testing and deployment to docker repository.

Possible Hurdles

  • Running into more errors on the rerun.
  • Recreating the docker build file might take a while to create, as installing some of the program takes a long time (especially root). This makes for tedious debugging.
  • The CI/CD pipeline will require a bit of tinkering to get set up, it will also cost money to use if we were to use a cloud platform to do a quick smoke screen for certain steps.

— Reply to this email directly, view it on GitHub https://github.com/KCCG/ClinSV/issues/13#issuecomment-1022762897, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEQQM2S44SMOV3T2OB2XFDUYCONTANCNFSM5DZE3QRA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

J-Bradlee commented 2 years ago

Progress Done Today

Next steps

Possible Hurdles

J-Bradlee commented 2 years ago

Progress Done over last month

Next Steps

Possible Hurdles

drmjc commented 2 years ago

excellent progress @J-Bradlee

This public data (from the platinum genomes project) is aligned to hg19, so please test ClinSV against this NA12878 file.

wget -O - ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR194/ERR194147/NA12878_S1.bam | samtools view -H

--2022-03-15 06:51:20--  ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR194/ERR194147/NA12878_S1.bam
           => ‘-’
Resolving ftp.sra.ebi.ac.uk (ftp.sra.ebi.ac.uk)... 193.62.197.74
Connecting to ftp.sra.ebi.ac.uk (ftp.sra.ebi.ac.uk)|193.62.197.74|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /vol1/run/ERR194/ERR194147 ... done.
==> SIZE NA12878_S1.bam ... 121691186161
==> PASV ... done.    ==> RETR NA12878_S1.bam ... done.
Length: 121691186161 (113G) (unauthoritative)

NA12878_S1.bam                         0%[                                                                       ]   2.83K  3.07KB/s               @SQ  SN:chrM LN:16571
@SQ SN:chr1 LN:249250621
@SQ SN:chr2 LN:243199373
@SQ SN:chr3 LN:198022430
@SQ SN:chr4 LN:191154276
@SQ SN:chr5 LN:180915260
@SQ SN:chr6 LN:171115067
@SQ SN:chr7 LN:159138663
@SQ SN:chr8 LN:146364022
@SQ SN:chr9 LN:141213431
@SQ SN:chr10    LN:135534747
@SQ SN:chr11    LN:135006516
@SQ SN:chr12    LN:133851895
@SQ SN:chr13    LN:115169878
@SQ SN:chr14    LN:107349540
@SQ SN:chr15    LN:102531392
@SQ SN:chr16    LN:90354753
@SQ SN:chr17    LN:81195210
@SQ SN:chr18    LN:78077248
@SQ SN:chr19    LN:59128983
@SQ SN:chr20    LN:63025520
@SQ SN:chr21    LN:48129895
@SQ SN:chr22    LN:51304566
@SQ SN:chrX LN:155270560
@SQ SN:chrY LN:59373566
@RG ID:NA12878  SM:NA12878
@PG ID:bwa  PN:bwa  VN:0.6.1-r104-tpx
NA12878_S1.bam                         0%[                                                                       ]   2.83K  3.06KB/s    in 0.9s    

is it time for a pull request?

drmjc commented 2 years ago

James, do you see the need to run ClinSV twice, or is this now resolved? see https://github.com/KCCG/ClinSV/issues/22#issuecomment-1035849684

J-Bradlee commented 2 years ago

James, do you see the need to run ClinSV twice, or is this now resolved? see #22 (comment)

I have never required it to run twice, it either fails trying to find the ref data on continues through. I have set up the ref data folder system a little bit differently than suggested on the read me. I'll add to the to do list to see if I can recreate that bug. I have also not touched the singularity container at all, and been working mainly on the docker container.

I have also run into a crash when trying to run grch37 with V1.0 clinSV, can you confirm if clinSV V1.0 is backwards compatible? I just assumed it was... Otherwise I'll dig into the error logs and see if I can fix it.

drmjc commented 2 years ago

It's not backwards compatible... V0.9 was only for hs37d5, v1.0 was only for x38.

The next enhancement is an obvious one... Allow the reference to be specified on the cli and grab the right resource files on the fly.

On Tue, 15 Mar 2022, 1:14 pm James Bradley, @.***> wrote:

James, do you see the need to run ClinSV twice, or is this now resolved? see #22 (comment) https://github.com/KCCG/ClinSV/issues/22#issuecomment-1035849684

I have never required it to run twice, it either fails trying to find the ref data on continues through. I have set up the ref data folder system a little bit differently than suggested on the read me. I'll add to the to do list to see if I can recreate that bug. I have also not touched the singularity container at all, and been working mainly on the docker container.

I have also run into a crash when trying to run grch37 with V1.0 clinSV, can you confirm if clinSV V1.0 is backwards compatible? I just assumed it was... Otherwise I'll dig into the error logs and see if I can fix it.

— Reply to this email directly, view it on GitHub https://github.com/KCCG/ClinSV/issues/13#issuecomment-1067489266, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEQQM22LS5N7UEJ345SB3LU77XBNANCNFSM5DZE3QRA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

neoscreen commented 2 years ago

great job @J-Bradlee Just a small bug. By using refdata-38 the structure of final igv.xml file is wrong. Everythng else looks fine. Also I noticed that through docker version 2.9 and refdata-38, the analysis time is double, comparing to refdata-37.

drmjc commented 2 years ago

Thanks @J-Bradlee, this is looking good.

I see now why you wanted to drop the v0.9 / hs37d5 documentation from the README.md - I agree with you. Probably a moot point if v1.1.0 is soon to come.

tomlodz commented 1 year ago

Hello,

I have a little problem with ClinSV. Both versions on dockers 0.9 and 1.0, after some time from the start of processing, the analysis stops with the following message:

finished after (hh:mm:ss): 00:00:00

exist status: 256

***** error exist status != 0 (256), please check /app/project_folder/SVs/joined/lumpy/sh/lumpy.caller.joined.e for more information

The lumpy.caller.joined.e file says that there is no bam file in the directory. This file should probably be generated during the analysis itself. Below is the content of the file:

export PATH=/app/clinsv/bin:/app/clinsv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

  • PATH=/app/clinsv/bin:/app/clinsv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
  • declare -A meanArr
  • declare -A stdevArr
  • declare -A readLArr ++ samtools view /app/project_folder/alignments/1.bam/1.bam.bam chr1:1000000-1100000 ++ cut -f 10 ++ awk '{ print length}' ++ sort -rn ++ awk '(NR==1){print}' [E::hts_open_format] fail to open file '/app/project_folder/alignments/1.bam/1.bam.bam' samtools view: failed to open "/app/project_folder/alignments/1.bam/1.bam.bam" for reading: No such file or directory
  • readLArr[1.bam]=

Thanks in advance