dmnfarrell / snipgenie

command line and desktop tool for microbial variant calling
GNU General Public License v3.0
8 stars 0 forks source link

subprocess.CalledProcessError: Command 'bcftools filter -i "QUAL>=40 && FORMAT/DP>=30 && DP4>=4" -o filtered.vcf.gz -O z calls.vcf' returned non-zero exit status 255. #5

Closed jolespin closed 10 months ago

jolespin commented 11 months ago

I'm using the Snipgenie version installed from GitHub main repository on Dec 1, 2023

Here's my log:

The following options were supplied
time:  01/12/2023 13:25:26
-------
input : []
manifest : veba_output/misc/reads_table.DENV4.csv
labelsep : _
labelindex : 0
reference : References/DENV4.fa
species : None
gb_file : None
threads : 1
overwrite : False
trim : False
unmapped : False
quality : 25
filters : QUAL>=40 && FORMAT/DP>=30 && DP4>=4
mask : None
custom_filters : False
platform : illumina
aligner : bwa
buildtree : False
bootstraps : 100
outdir : snipgenie_output/reads_based/DENV4
qc : False
dummy : False
test : False
version : False
omit_samples : []
get_stats : True
logfile : snipgenie_output/reads_based/DENV4/run.log

using manifest file for samples
57 samples were loaded:
----------------------
           sample                                          filename1                                          filename2  read_length
0   DENV3_167_S33  ./veba_output/preprocess/DENV3_167_S33/output/...  ./veba_output/preprocess/DENV3_167_S33/output/...          251
1   DENV3_168_S34  ./veba_output/preprocess/DENV3_168_S34/output/...  ./veba_output/preprocess/DENV3_168_S34/output/...          275
2   DENV3_169_S35  ./veba_output/preprocess/DENV3_169_S35/output/...  ./veba_output/preprocess/DENV3_169_S35/output/...          274
3   DENV3_170_S36  ./veba_output/preprocess/DENV3_170_S36/output/...  ./veba_output/preprocess/DENV3_170_S36/output/...          274
4   DENV3_171_S37  ./veba_output/preprocess/DENV3_171_S37/output/...  ./veba_output/preprocess/DENV3_171_S37/output/...          256
5   DENV3_172_S38  ./veba_output/preprocess/DENV3_172_S38/output/...  ./veba_output/preprocess/DENV3_172_S38/output/...          261
6   DENV3_173_S39  ./veba_output/preprocess/DENV3_173_S39/output/...  ./veba_output/preprocess/DENV3_173_S39/output/...          221
7   DENV3_174_S40  ./veba_output/preprocess/DENV3_174_S40/output/...  ./veba_output/preprocess/DENV3_174_S40/output/...          251
8   DENV3_175_S41  ./veba_output/preprocess/DENV3_175_S41/output/...  ./veba_output/preprocess/DENV3_175_S41/output/...          202
9   DENV3_176_S42  ./veba_output/preprocess/DENV3_176_S42/output/...  ./veba_output/preprocess/DENV3_176_S42/output/...          213
10  DENV3_177_S43  ./veba_output/preprocess/DENV3_177_S43/output/...  ./veba_output/preprocess/DENV3_177_S43/output/...          207
11  DENV3_178_S44  ./veba_output/preprocess/DENV3_178_S44/output/...  ./veba_output/preprocess/DENV3_178_S44/output/...          216
12  DENV3_179_S45  ./veba_output/preprocess/DENV3_179_S45/output/...  ./veba_output/preprocess/DENV3_179_S45/output/...          197
13  DENV3_180_S46  ./veba_output/preprocess/DENV3_180_S46/output/...  ./veba_output/preprocess/DENV3_180_S46/output/...          201
14  DENV3_181_S47  ./veba_output/preprocess/DENV3_181_S47/output/...  ./veba_output/preprocess/DENV3_181_S47/output/...          183
15  DENV3_182_S48  ./veba_output/preprocess/DENV3_182_S48/output/...  ./veba_output/preprocess/DENV3_182_S48/output/...          199
16  DENV3_183_S49  ./veba_output/preprocess/DENV3_183_S49/output/...  ./veba_output/preprocess/DENV3_183_S49/output/...          249
17  DENV3_184_S50  ./veba_output/preprocess/DENV3_184_S50/output/...  ./veba_output/preprocess/DENV3_184_S50/output/...          246
18  DENV3_185_S51  ./veba_output/preprocess/DENV3_185_S51/output/...  ./veba_output/preprocess/DENV3_185_S51/output/...          237
19  DENV3_186_S52  ./veba_output/preprocess/DENV3_186_S52/output/...  ./veba_output/preprocess/DENV3_186_S52/output/...          249
20  DENV3_187_S53  ./veba_output/preprocess/DENV3_187_S53/output/...  ./veba_output/preprocess/DENV3_187_S53/output/...          257
21  DENV3_188_S54  ./veba_output/preprocess/DENV3_188_S54/output/...  ./veba_output/preprocess/DENV3_188_S54/output/...          268
22  DENV3_189_S55  ./veba_output/preprocess/DENV3_189_S55/output/...  ./veba_output/preprocess/DENV3_189_S55/output/...          254
23  DENV3_190_S56  ./veba_output/preprocess/DENV3_190_S56/output/...  ./veba_output/preprocess/DENV3_190_S56/output/...          252
24  DENV3_191_S57  ./veba_output/preprocess/DENV3_191_S57/output/...  ./veba_output/preprocess/DENV3_191_S57/output/...          250
25    DENV3_45_S1  ./veba_output/preprocess/DENV3_45_S1/output/tr...  ./veba_output/preprocess/DENV3_45_S1/output/tr...          266
26    DENV3_46_S2  ./veba_output/preprocess/DENV3_46_S2/output/tr...  ./veba_output/preprocess/DENV3_46_S2/output/tr...          267
27    DENV3_47_S3  ./veba_output/preprocess/DENV3_47_S3/output/tr...  ./veba_output/preprocess/DENV3_47_S3/output/tr...          265
28    DENV3_48_S4  ./veba_output/preprocess/DENV3_48_S4/output/tr...  ./veba_output/preprocess/DENV3_48_S4/output/tr...          259
29    DENV3_49_S5  ./veba_output/preprocess/DENV3_49_S5/output/tr...  ./veba_output/preprocess/DENV3_49_S5/output/tr...          263
30    DENV3_50_S6  ./veba_output/preprocess/DENV3_50_S6/output/tr...  ./veba_output/preprocess/DENV3_50_S6/output/tr...          214
31    DENV3_51_S7  ./veba_output/preprocess/DENV3_51_S7/output/tr...  ./veba_output/preprocess/DENV3_51_S7/output/tr...          260
32    DENV3_52_S8  ./veba_output/preprocess/DENV3_52_S8/output/tr...  ./veba_output/preprocess/DENV3_52_S8/output/tr...          246
33    DENV3_53_S9  ./veba_output/preprocess/DENV3_53_S9/output/tr...  ./veba_output/preprocess/DENV3_53_S9/output/tr...          250
34   DENV3_54_S10  ./veba_output/preprocess/DENV3_54_S10/output/t...  ./veba_output/preprocess/DENV3_54_S10/output/t...          249
35   DENV3_55_S11  ./veba_output/preprocess/DENV3_55_S11/output/t...  ./veba_output/preprocess/DENV3_55_S11/output/t...          249
36   DENV3_56_S12  ./veba_output/preprocess/DENV3_56_S12/output/t...  ./veba_output/preprocess/DENV3_56_S12/output/t...          258
37   DENV3_57_S13  ./veba_output/preprocess/DENV3_57_S13/output/t...  ./veba_output/preprocess/DENV3_57_S13/output/t...          244
38   DENV3_58_S14  ./veba_output/preprocess/DENV3_58_S14/output/t...  ./veba_output/preprocess/DENV3_58_S14/output/t...          251
39   DENV3_59_S15  ./veba_output/preprocess/DENV3_59_S15/output/t...  ./veba_output/preprocess/DENV3_59_S15/output/t...          215
40   DENV3_60_S16  ./veba_output/preprocess/DENV3_60_S16/output/t...  ./veba_output/preprocess/DENV3_60_S16/output/t...          226
41   DENV3_61_S17  ./veba_output/preprocess/DENV3_61_S17/output/t...  ./veba_output/preprocess/DENV3_61_S17/output/t...          249
42   DENV3_62_S18  ./veba_output/preprocess/DENV3_62_S18/output/t...  ./veba_output/preprocess/DENV3_62_S18/output/t...          246
43   DENV3_63_S19  ./veba_output/preprocess/DENV3_63_S19/output/t...  ./veba_output/preprocess/DENV3_63_S19/output/t...          264
44   DENV3_64_S20  ./veba_output/preprocess/DENV3_64_S20/output/t...  ./veba_output/preprocess/DENV3_64_S20/output/t...          195
45   DENV3_65_S21  ./veba_output/preprocess/DENV3_65_S21/output/t...  ./veba_output/preprocess/DENV3_65_S21/output/t...          239
46   DENV3_66_S22  ./veba_output/preprocess/DENV3_66_S22/output/t...  ./veba_output/preprocess/DENV3_66_S22/output/t...          258
47   DENV3_67_S23  ./veba_output/preprocess/DENV3_67_S23/output/t...  ./veba_output/preprocess/DENV3_67_S23/output/t...          265
48   DENV3_68_S24  ./veba_output/preprocess/DENV3_68_S24/output/t...  ./veba_output/preprocess/DENV3_68_S24/output/t...          250
49   DENV3_69_S25  ./veba_output/preprocess/DENV3_69_S25/output/t...  ./veba_output/preprocess/DENV3_69_S25/output/t...          246
50   DENV3_70_S26  ./veba_output/preprocess/DENV3_70_S26/output/t...  ./veba_output/preprocess/DENV3_70_S26/output/t...          213
51   DENV3_71_S27  ./veba_output/preprocess/DENV3_71_S27/output/t...  ./veba_output/preprocess/DENV3_71_S27/output/t...          241
52   DENV3_72_S28  ./veba_output/preprocess/DENV3_72_S28/output/t...  ./veba_output/preprocess/DENV3_72_S28/output/t...          261
53   DENV3_73_S29  ./veba_output/preprocess/DENV3_73_S29/output/t...  ./veba_output/preprocess/DENV3_73_S29/output/t...          265
54   DENV3_74_S30  ./veba_output/preprocess/DENV3_74_S30/output/t...  ./veba_output/preprocess/DENV3_74_S30/output/t...          256
55   DENV3_75_S31  ./veba_output/preprocess/DENV3_75_S31/output/t...  ./veba_output/preprocess/DENV3_75_S31/output/t...          232
56   DENV3_76_S32  ./veba_output/preprocess/DENV3_76_S32/output/t...  ./veba_output/preprocess/DENV3_76_S32/output/t...          249

building index
indexing..
[bwa_index] Pack FASTA... 0.00 sec
[bwa_index] Construct BWT for the packed sequence...
[bwa_index] 0.00 seconds elapse.
[bwa_index] Update BWT... 0.00 sec
[bwa_index] Pack forward-only FASTA... 0.00 sec
[bwa_index] Construct SA from BWT and Occ... 0.00 sec
[main] Version: 0.7.17-r1188
[main] CMD: bwa index References/DENV4.fa
[main] Real time: 0.053 sec; CPU: 0.007 sec
bwa index References/DENV4.fa
aligning files
--------------
Using reference genome: References/DENV4.fa
57/57 samples already aligned

calling variants
----------------
snipgenie_output/reads_based/DENV4/raw.bcf already exists
calling variants..
bcftools call --ploidy 1 -m -v -o snipgenie_output/reads_based/DENV4/calls.vcf snipgenie_output/reads_based/DENV4/raw.bcf
1266 sites called as variants
bcftools reheader --samples snipgenie_output/reads_based/DENV4/samples.txt -o /tmp/calls.vcf snipgenie_output/reads_based/DENV4/calls.vcf
bcftools filter -i "QUAL>=40 && FORMAT/DP>=30 && DP4>=4" -o snipgenie_output/reads_based/DENV4/filtered.vcf.gz -O z snipgenie_output/reads_based/DENV4/calls.vcf
[E::bcf_hdr_add_sample_len] Duplicated sample name 'DENV3'
Failed to read from snipgenie_output/reads_based/DENV4/calls.vcf: could not parse header
Traceback (most recent call last):
  File "/expanse/projects/jcl110/anaconda3/envs/snippy_env/bin/snipgenie", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/expanse/projects/jcl110/anaconda3/envs/snippy_env/lib/python3.12/site-packages/snipgenie/app.py", line 1140, in main
    W.run()
  File "/expanse/projects/jcl110/anaconda3/envs/snippy_env/lib/python3.12/site-packages/snipgenie/app.py", line 989, in run
    self.vcf_file = variant_calling(bam_files, self.reference, self.outdir,
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/expanse/projects/jcl110/anaconda3/envs/snippy_env/lib/python3.12/site-packages/snipgenie/app.py", line 561, in variant_calling
    tmp = subprocess.check_output(cmd,shell=True)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/expanse/projects/jcl110/anaconda3/envs/snippy_env/lib/python3.12/subprocess.py", line 466, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/expanse/projects/jcl110/anaconda3/envs/snippy_env/lib/python3.12/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'bcftools filter -i "QUAL>=40 && FORMAT/DP>=30 && DP4>=4" -o snipgenie_output/reads_based/DENV4/filtered.vcf.gz -O z snipgenie_output/reads_based/DENV4/calls.vcf' returned non-zero exit status 255.
jolespin commented 11 months ago

Is snipgenie trying to parse the sample name?

jolespin commented 11 months ago

I tried this again using the original reads input (not manifest) and I got the same error.

dmnfarrell commented 11 months ago

The manifest file is used but later it seems to mess up the vcf header when it does the reheader step. That's so the sample names match (by default they are the filenames). There's a file called samples.txt that's used for this. What does it look like?

jolespin commented 11 months ago
(base) [jespinoz@login01 DENV4]$ cat samples.txt
DENV3

Is it trying to parse the sample ids?

dmnfarrell commented 10 months ago

Yes when doing the reheader it was still trying to parse the sample names. It doesn't do that now. You could update and try it again.

jolespin commented 10 months ago

Can you confirm your tool can work with this dataset?

Here's the log:

The following options were supplied
time:  14/12/2023 10:44:02
-------
input : ['reads/DENV2']
manifest : None
labelsep : _
labelindex : 0
reference : References/DENV2.fa
species : None
gb_file : None
threads : 1
overwrite : False
trim : False
unmapped : False
quality : 25
filters : QUAL>=40 && FORMAT/DP>=30 && DP4>=4
mask : None
custom_filters : False
platform : illumina
aligner : bwa
buildtree : False
bootstraps : 100
outdir : snipgenie_output/reads_based/DENV2
qc : False
dummy : False
test : False
version : False
omit_samples : []
get_stats : True
logfile : snipgenie_output/reads_based/DENV2/run.log

there seem to be duplicates:
sample
DENV2    198
Name: count, dtype: int64
error in filename parsing, check labelsep and labelindex options
                 name  ... pair
41    DENV2_100_S91_1  ...    1
40    DENV2_100_S91_2  ...    2
3    DENV2_101_S103_1  ...    3
2    DENV2_101_S103_2  ...    4
157  DENV2_102_S115_1  ...    5
156  DENV2_102_S115_2  ...    6
140  DENV2_103_S127_1  ...    7
141  DENV2_103_S127_2  ...    8
158  DENV2_104_S139_1  ...    9
159  DENV2_104_S139_2  ...   10

[10 rows x 4 columns]

I've tried several different versions. If Snipgenie can't accomodate this data, do you recommend another tool I can try besides snippy?

Here are the files I'm providing:

share_snipgenie
├── DENV2.fa
├── reads
│   └── veba_output
│       └── preprocess
│           ├── DENV2_100_S91
│           │   └── output
│           │       ├── cleaned_1.fastq.gz
│           │       └── cleaned_2.fastq.gz
│           ├── DENV2_101_S103
│           │   └── output
│           │       ├── cleaned_1.fastq.gz
│           │       └── cleaned_2.fastq.gz
│           └── DENV2_102_S115
│               └── output
│                   ├── cleaned_1.fastq.gz
│                   └── cleaned_2.fastq.gz
└── reads_table.DENV2.csv

10 directories, 8 files

https://drive.google.com/drive/folders/1wUFYi1UxY79jaok-a0dEbDiTPeGT46xb?usp=sharing

dmnfarrell commented 10 months ago

But you're not using the manifest file here? It is required for these cases. Otherwise it's trying to parse the file names directly. The fix I made was to avoid parsing at all when there is a manifest.

jolespin commented 10 months ago

Wow. Sorry about that! I forgot to change the -i to -M in the command. Running it now.

dmnfarrell commented 10 months ago

It seems to work for me with your data if I used the manifest file. It's only that the samples seem identical so it can't find any informative SNPs.

Screenshot from 2023-12-14 19-39-37

jolespin commented 10 months ago

What program are you using for visualizing the VCFs? I'm testing it out on the full set now.

jolespin commented 10 months ago

Ok the newest version you have worked great. Thanks for all your help in this. Greatly appreciated. Will certainly publish using this tool in the future now that I have it all dialed in for my workflow.

dmnfarrell commented 10 months ago

Thanks. I used IGV to look at the vcf and bam files.