Open robertwhbaldwin opened 2 years ago
Hi Robert, I'm a bit confused about the series of events. It always helps if you list all the commands that you ran.
If the genotype.vcf.gz files don't exist for some samples, you'd need to return to that step and make sure that genotyping finishes for those samples. I'd also make sure that your bam index is up-to-date.
The first step in population calling I used this:
smoove call --outdir results-smoove/ --name $sample --fasta $reference_fasta -p 1 --genotype /path/to/$sample.bam
Basically, what you stated on in the doc but I did not include an --exclude bed file.
As I said, only 13/25 of my samples got the genotype.vcf.gz file as output. From what I understand this is not because genotyping did not finish, but because it finished and there was nothing called. Instead of a vcf file I got an EOF warning message. There was a ticket on this forum about this and that was the conclusion. No calls. Seems odd to me that half my samples would have calls and the other half would not.
I believe the warning message looked like this: [smoove] 2020/04/28 09:39:43 2020/04/28 09:39:43 EOF [smoove] 2020/04/28 09:39:43 Failed to open -: unknown file type panic: exit status 255
You should have calls unless you are doing targetted sequencing or extremely low coverage.
yes, there's a problem with the samples with no vcf from the call step, a lot of overlapping reads, for example. I ran picard collect_wgs_metrics and after you exclude overlapping reads, pcr dups, reads with mapping quality < 20, etc., the mean coverage is 7-9X. It should be at least 20X so a lot of reads were discarded by these filters. But would that explain why there's no vcf for these samples when I run smoove call? Because even after picard filrtering it's not extremely low coverage. But there's obviously a problem with the data and I don't know the extent of it yet.
it would help to see the full output of a job that failed on the call step. but yeah, sounds like a problem with your data. maybe the job ran out of memory or time.
it turns out some of the bam files were generated using R1 and R1 instead of R1 and R2 files. so we got half the coverage. And no SV obviously since it's not actually paired end.
Hi,
I'm doing population calling and made it to the genotype step. There's 25 samples, similar coverage (20X). The first handful of samples finished quickly (1hr). This is the command I used:
smoove genotype -d -x -p 1 --name ${i}-joint --outdir ./ --fasta /assembly/GCF_014851395.1_ASM1485139v1_genomic.fa --vcf merged.sites.vcf.gz /bams/${i}.bam
But then I hit a sample that took a very long time (see log file below). After ~20 hrs the sample was still running so I just stopped it thinking that there must be a problem.
I then noticed that the sample that was taking a long time had no ...smoove.genotype.vcf.gz. For the earlier population calling step that produced the ...smooved.genotyped.vcf.gz files only 13/25 samples actually got these VCF files. For the rest I got the EOF can't read from std input warning which was the topic of another ticket. So it seems to me that the long time it it is taking for the genotyping step to finish may be related to the fact these samples had no ...smoove.genotyped.vcf.gz file. Can someone explain this? Is joint genotyping only meant for samples with the intermediate ...smoove.genotyped.vcf.gz files?
Thank You - Robert
2021/07/20 16:35:04 [W::hts_idx_load3] The index file is older than the data file: /bams/G0620_M02.bam.bai [smoove] 2021/07/20 16:35:36 [smoove] 2021/07/20 16:35:36 starting with version 0.2.6 [smoove] 2021/07/20 16:35:36 [smoove] 2021/07/20 16:35:36 running duphold on 1 files in 16 processes [smoove] 2021/07/20 16:35:36 [smoove] 2021/07/20 16:35:36 [W::hts_idx_load2] The index file is older than the data file: /bams/G0620_M02.bam.bai [smoove] 2021/07/20 16:37:29 [smoove] 2021/07/20 16:37:29 [duphold] finished [smoove] 2021/07/20 16:37:29 [smoove] 2021/07/20 16:37:29 finished duphold [smoove] 2021/07/20 16:37:29 wrote sorted, indexed file to G0620_M02-joint-smoove.genotyped.vcf.gz RHF05301 [smoove] 2021/07/20 16:37:29 starting with version 0.2.6 [smoove] 2021/07/20 16:37:29 writing sorted, indexed file to RHF05301-joint-smoove.genotyped.vcf.gz [smoove] 2021/07/20 16:37:29 > gsort version 0.0.6 [smoove] 2021/07/20 16:37:29 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/20 16:54:24 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/20 17:12:19 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/20 17:29:36 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/20 17:47:50 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/20 18:04:28 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/20 18:22:17 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/20 18:39:21 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/20 18:55:59 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/20 19:14:42 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/20 19:30:15 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/20 19:48:43 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/20 20:06:24 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/20 20:22:22 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/20 20:40:15 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/20 20:56:57 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/20 21:13:31 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/20 21:33:23 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/20 21:48:45 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/20 22:06:37 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/20 22:23:04 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/20 22:40:23 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/20 22:59:03 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/20 23:14:21 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/20 23:33:33 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/20 23:49:42 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/21 00:07:04 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/21 00:24:53 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/21 00:41:29 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/21 01:00:39 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/21 01:21:06 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/21 01:37:23 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/21 01:53:16 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/21 02:09:04 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/21 02:25:21 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/21 02:41:30 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/21 02:57:28 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/21 03:13:23 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/21 03:29:13 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/21 03:45:08 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/21 04:00:52 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/21 04:16:32 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/21 04:32:14 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/21 04:47:49 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/21 05:03:39 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/21 05:19:19 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/21 05:35:15 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/21 05:50:55 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/21 06:06:37 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/21 06:22:41 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/21 06:38:33 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/21 06:54:05 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/21 07:09:55 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/21 07:25:34 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/21 07:41:21 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/21 07:56:49 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/21 08:12:47 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/21 08:28:31 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/21 08:44:18 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/21 09:00:12 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/21 09:16:03 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/21 09:32:20 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/21 09:48:09 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/21 10:03:54 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/21 10:19:42 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/21 10:35:54 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/21 10:51:35 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/21 11:07:18 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai [smoove] 2021/07/21 11:23:00 [W::hts_idx_load3] The index file is older than the data file: /bams/RHF05301.bam.bai