Closed tyler5huang closed 7 years ago
What was the file name of your VarScan's VCF file?
icgc_cll-varscan-annotated.vcf.gz
Sent from my iPhone
On 24 Jul 2017, at 3:37 PM, Li Tai Fang notifications@github.com wrote:
What was the file name of your VarScan's VCF file?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
Is it possible that you send me the vcf.gz file so I can check? I'm also wondering if Python 3.2.3's gzip library isn't somehow different from later versions.
Also, can you run docker on your end? If so, we've just dockerized SomaticSeq: https://hub.docker.com/r/lethalfang/somaticseq/
Actually, why don't you unpack the bgzip'ed VCF file, and see if that fixes your problem.
hi my vcf.gz files are >500MB each. So I created a smaller file, which is in the .vcf format (not vcf.gz). The error I get is this:
[huangwt@n006 Real-bcbio103-truth]$ $myCodes/SomaticSeq.Wrapper.sh > --mutect2 $myDir/mutect.vcf > --varscan-snv $myDir/varscan.vcf > --vardict $myDir/vardict.vcf > --ada-r-script ada_model_builder.R > --truth-snv $myResults/R1.truth.snv.vcf > --output-dir $myResults/somaticseq --mutect2 '/mnt/projects/huangwt/wgs/smurf/test/mutect2.vcf' --varscan-snv '/mnt/projects/huangwt/wgs/smurf/test/varscan.vcf' --vardict '/mnt/projects/huangwt/wgs/smurf/test/vardict.vcf' --ada-r-script 'ada_model_builder.R' --truth-snv '/mnt/projects/huangwt/wgs/Results-SMuRF/Real-bcbio103-truth/R1.truth.snv.vcf' --output-dir '/mnt/projects/huangwt/wgs/Results-SMuRF/Real-bcbio103-truth/somaticseq' --
Traceback (most recent call last): File "/home/huangwt/Codes/somaticseq/modify_VJSD.py", line 129, in
HUANG Weitai
On Tuesday, July 25, 2017, 2:45:31 PM GMT+8, Li Tai Fang notifications@github.com wrote:
Actually, why don't you unpack the bgzip'ed VCF file, and see if that fixes your problem.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
Hi I tried again using your original code : while line_i.startswith('#'): It returned with the error: Error: Unable to access jarfile CombineVariants
hi my vcf.gz files are >500MB each. So I created a smaller file, which is in the .vcf format (not vcf.gz). The error I get is this:
[huangwt@n006 Real-bcbio103-truth]$ $myCodes/SomaticSeq.Wrapper.sh > --mutect2 $myDir/mutect.vcf > --varscan-snv $myDir/varscan.vcf > --vardict $myDir/vardict.vcf > --ada-r-script ada_model_builder.R > --truth-snv $myResults/R1.truth.snv.vcf > --output-dir $myResults/somaticseq --mutect2 '/mnt/projects/huangwt/wgs/smurf/test/mutect2.vcf' --varscan-snv '/mnt/projects/huangwt/wgs/smurf/test/varscan.vcf' --vardict '/mnt/projects/huangwt/wgs/smurf/test/vardict.vcf' --ada-r-script 'ada_model_builder.R' --truth-snv '/mnt/projects/huangwt/wgs/Results-SMuRF/Real-bcbio103-truth/R1.truth.snv.vcf' --output-dir '/mnt/projects/huangwt/wgs/Results-SMuRF/Real-bcbio103-truth/somaticseq' --
Traceback (most recent call last): File "/home/huangwt/Codes/somaticseq/modify_VJSD.py", line 129, in
HUANG Weitai
On Tuesday, July 25, 2017, 2:45:31 PM GMT+8, Li Tai Fang notifications@github.com wrote:
Actually, why don't you unpack the bgzip'ed VCF file, and see if that fixes your problem.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
The script uses GATK to combine all the VCF files from different callers (i.e., GATK CombineVariants). You can point to the location of the GATK.jar file by --gatk $PATH/TO/GATK/GenomeAnalysis.jar
Alternatively, you can download the latest version 2.2.5. There, without --gatk, it'll just use cat and the vcfsorter.pl script to combine and sort those VCF files.
I provided the path to --gatk but it returned with this error:
Picked up _JAVA_OPTIONS: -XX:+UseSerialGC
I have not tried GATK version 2 before. Can you give GATK3 a try? GATK4 beta doesn't work for now.
Trying with GATK3.7:
Picked up _JAVA_OPTIONS: -XX:+UseSerialGC Exception in thread "main" java.lang.UnsupportedClassVersionError: org/broadinstitute/gatk/engine/CommandLineGATK : Unsupported major.minor version 52.0 at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:800) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:482)
any specific GATK version to use?
I've tried most versions of GATK 3, including 3.7 and hasn't had a problem so far. To get a detailed description of how each step in the script works, the documentation is in the docs folder: https://github.com/bioinform/somaticseq/blob/master/docs/Manual.pdf
Starting from page 4 is the step-by-step guide of the pipeline.
Hi I tried with python3.6 (instead of python3.2) with the corresponding dependencies and consolidated the calls. I ran the r scripts to train and predict on my own and they work as well. thanks
Error and change logs:
1. Traceback (most recent call last): File "/home/huangwt/Codes/somaticseq/modify_VJSD.py", line 116, in
with genome.open_textfile(right_files[0]) as vcf:
File "/home/huangwt/Codes/somaticseq/genomic_file_handlers.py", line 224, in open_textfile
return gzip.open(file_name, 'rt')
File "/mnt/software/src/Python-3.2.3/Lib/gzip.py", line 46, in open
return GzipFile(filename, mode, compresslevel)
File "/mnt/software/src/Python-3.2.3/Lib/gzip.py", line 156, in init
raise IOError("Mode " + mode + " not supported")
IOError: Mode rt not supported
Changed:
File "/home/huangwt/Codes/somaticseq/genomic_file_handlers.py", line 224, in open_textfile
return gzip.open(file_name, 'rt') to return gzip.open(file_name, 'r')
2. Traceback (most recent call last): File "/home/huangwt/Codes/somaticseq/modify_VJSD.py", line 126, in
while line_i.startswith('#'):
TypeError: startswith first arg must be bytes or a tuple of bytes, not str
Changed:
File "/home/huangwt/Codes/somaticseq/modify_VJSD.py", line 126, in
while line_i.startswith('#'): to while line_i.startswith(b'#'):
3. Traceback (most recent call last): File "/home/huangwt/Codes/somaticseq/modify_VJSD.py", line 128, in
if re.match(r'##fileformat=', line_i):
File "/mnt/software/src/Python-3.2.3/Lib/re.py", line 153, in match
return _compile(pattern, flags).match(string)
TypeError: can't use a string pattern on a bytes-like object