Open dmasoodfda opened 2 years ago
Dear user, what is your genome coverage? Best, Valentina
The genome coverage is variable from 80X-100X for the samples. Is there an optimal genome coverage that we should be working with? And is so what is it?
Thank you Daniall
Dear Daniall, your coverage should be enough to get genotypes. The reason for your error is that you use coordinates of SNPs in hg19:
makePileup = /home/daniall.masood/test/hg19_snp142.SingleDiNucl.1based.txt.gz fastaFile = /home/daniall.masood/reference/GRCh38/GRCh38.d1.vd1.fa SNPfile = /home/daniall.masood/test/hg19_snp142.SingleDiNucl.1based.txt.gz
Can you use hg38 instead?
Hi Valentina,
So I have found hg38 vcf files instead and have placed the path to it in makePileup and SNPfile. However the pileup files returned have nothing in them and are not returning the proper output still. I am using samtools 1.11 so that condition is satisified. Is there anything else that I am overlooking?
Thanks Daniall
[BAF]
makePileup = /home/daniall.masood/00-All.vcf fastaFile = /home/daniall.masood/reference/GRCh38/GRCh38.d1.vd1.fa SNPfile = /home/daniall.masood/00-All.vcf
this is my new config file ending
Dear Daniall, first, could you use just Common SNPs vs All - it will run much faster (and maybe better). ALL include a lot of very rare SNPs that will not be present in your donor anyway.
Could you share the complete output into the command line and the updated config file?
Hi Valentina,
Running into the same issue still. I am using the common snps vcf file but still getting empty pileup files and an empty BAF text file. I am sharing the commad line prompt, updated config, as well as some errors that I am seeing now. Please let me know if you have any insight to these.
Thank you Daniall Masood
command line: freec -conf WGS.txt
Config file: [general]
chrLenFile = /home/daniall.masood/reference/GRCh38/GRCh38.d1.vd1.fa.fai window=50000 breakPointThreshold = 0.04 sex=XX maxThreads=40
chrFiles = /home/daniall.masood/GRCh38
outputDir = /home/daniall.masood/test/WGS_LOH_revised/
[sample]
mateFile = /home/daniall.masood/WGS/WGS_NV_T_1.bwa.dedup.bwa.dedup.bam inputFormat = BAM mateOrientation = 0
[control]
mateFile = /home/daniall.masood/WGS/WGS_NV_N_1.bwa.dedup.bwa.dedup.bam inputFormat = BAM mateOrientation = 0
[BAF]
makePileup = /home/daniall.masood/test/00-common_all.vcf fastaFile = /home/daniall.masood/reference/GRCh38/GRCh38.d1.vd1.fa SNPfile = /home/daniall.masood/test/00-common_all.vcf
Errors:
..failed to run segmentation on chr1..failed to run segmentation on chr2
..failed to run segmentation on chr3 ..failed to run segmentation on chr4 ..failed to run segmentation on chr5 ..failed to run segmentation on chr6 ..failed to run segmentation on chr7 ..failed to run segmentation on chr8 ..failed to run segmentation on chr9 ..failed to run segmentation on chr10 ..failed to run segmentation on chr11 ..failed to run segmentation on chr12 ..failed to run segmentation on chr13 ..failed to run segmentation on chr14 ..failed to run segmentation on chr15 ..failed to run segmentation on chr16 ..failed to run segmentation on chr17 ..failed to run segmentation on chr18 ..failed to run segmentation on chr19 ..failed to run segmentation on chr20 ..failed to run segmentation on chr21 ..failed to run segmentation on chr22 ..failed to run segmentation on chrX ..failed to run segmentation on chr..failed to run segmentation on chr..failed to run segmentation on chr4..failed to run segmentation on chr512 ..failed to run segmentation on chr..failed to run segmentation on chr7 ..failed to run segmentation on chr6
..failed to run segmentation on chr10 ..failed to run segmentation on chr..failed to run segmentation on chr..failed to run segmentation on chr12 ..failed to run segmentation on chr13 8 3 ..failed to run segmentation on chr 11 ..failed to run segmentation on chr14 ..failed to run segmentation on chr..failed to run segmentation on chr15 17 ..failed to run segmentation on chr18 16 ..failed to run segmentation on chr9 ..failed to run segmentation on chr19 ..failed to run segmentation on chr20 ..failed to run segmentation on chr21 ..failed to run segmentation on chr22 ..failed to run segmentation on chrX ..failed to run segmentation on chr1 ..failed to run segmentation on chr2 ..failed to run segmentation on chr3 ..failed to run segmentation on chr4 ..failed to run segmentation on chr5 ..failed to run segmentation on chr7 ..failed to run segmentation on chr8 ..failed to run segmentation on chr..failed to run segmentation on chr9 ..failed to run segmentation on chr11 ..failed to run segmentation on chr1210 ..failed to run segmentation on chr13
..failed to run segmentation on chr14 ..failed to run segmentation on chr..failed to run segmentation on chr16 15 ..failed to run segmentation on chr17 ..failed to run segmentation on chr6 ..failed to run segmentation on chr18 ..failed to run segmentation on chr19 ..failed to run segmentation on chr20 ..failed to run segmentation on chr21 ..failed to run segmentation on chr22 ..failed to run segmentation on chrX ..failed to run segmentation on chr..failed to run segmentation on chr..failed to run segmentation on chr..failed to run segmentation on chr..failed to run segmentation on chr..failed to run segmentation on chr..failed to run segmentation on chr9 4..failed to run segmentation on chr11 1 ..failed to run segmentation on chr2..failed to run segmentation on chr7
..failed to run segmentation on chr..failed to run segmentation on chr13 10..failed to run segmentation on chr614
..failed to run segmentation on chr15 3 8 5
..failed to run segmentation on chr16 ..failed to run segmentation on chr12 ..failed to run segmentation on chr18 ..failed to run segmentation on chr17 ..failed to run segmentation on chr19 ..failed to run segmentation on chr20 ..failed to run segmentation on chr21 ..failed to run segmentation on chr22 ..failed to run segmentation on chrX
Hi Valentina,
The issues are still persisting and we need to start running and detecting the LOH in our samples soon.
Thank you Daniall
Dear Daniall, it looks like you have serious problems (given the errors you get). Can you run the following config file and send me the complete output into the command line?
/home/daniall.masood/GRCh38/ includes files names chr1.fa or similar, right?
[general]
chrLenFile = /home/daniall.masood/reference/GRCh38/GRCh38.d1.vd1.fa.fai
#window=50000
#breakPointThreshold = 0.04
sex=XX
maxThreads=40
#ploidy = 2
chrFiles = /home/daniall.masood/GRCh38/
#telocentromeric = 50000
# I suggest you use the mappability file
gemMappabilityFile = /home/daniall.masood/out100m2_hg38.gem
outputDir = /home/daniall.masood/test/WGS_LOH_revised/
#coefficientOfVariation = 0.062
#breakPointThreshold = -.002;
#window = 50000
#chrFiles = /home/daniall.masood/GRCh38
#outputDir = test
#degree=3
#intercept = 0
[sample]
mateFile = /home/daniall.masood/WGS/WGS_NV_T_1.bwa.dedup.bwa.dedup.bam
inputFormat = BAM
mateOrientation = 0
#mateCopyNumberFile = HCC1143.arachne_sample.cpn
[control]
mateFile = /home/daniall.masood/WGS/WGS_NV_N_1.bwa.dedup.bwa.dedup.bam
inputFormat = BAM
mateOrientation = 0
#mateCopyNumberFile = HCC1143_BL.arachne_control.cpn
[BAF]
makePileup = /home/daniall.masood/test/00-common_all.vcf
fastaFile = /home/daniall.masood/reference/GRCh38/GRCh38.d1.vd1.fa
SNPfile = /home/daniall.masood/test/00-common_all.vcf
minimalCoveragePerPosition=7
Hi Valentina,
/home/daniall.masood/GRCh38/ does include files like chr1.fa you are correct.
I still am able to get CNV calls just not the germline/somatic column in the file.
I will try running this right away, thank you so much. I will get back to you asap about the results I get.
Thanks Daniall
Sorry one last thing. It gives a warning right away that I need a window or coefficient of variation specified? I was wondering if I should still specify those or continue to try and run it without them as you specified above.
Thanks Daniall
When it gives a warning it continues with default parameters, so it should be OK.
This is my output from the command line above. And then I get some extra back into my console output as well:
..failed to run segmentation on chr..failed to run segmentation on chr2..failed to run segmentation on chr..failed to run segmentation on chr3..failed to run segmentation on chr5 1 4..failed to run segmentation on chr8 ..failed to run segmentation on chr6
..failed to run segmentation on chr7 ..failed to run segmentation on chr9 ..failed to run segmentation on chr10 ..failed to run segmentation on chr11 ..failed to run segmentation on chr14 ..failed to run segmentation on chr15 ..failed to run segmentation on chr13 ..failed to run segmentation on chr12 ..failed to run segmentation on chr16 ..failed to run segmentation on chr17 ..failed to run segmentation on chr18 ..failed to run segmentation on chr19 ..failed to run segmentation on chr20 ..failed to run segmentation on chr21 ..failed to run segmentation on chr22 ..failed to run segmentation on chrX ..failed to run segmentation on chr1 ..failed to run segmentation on chr2 ..failed to run segmentation on chr3 ..failed to run segmentation on chr4 ..failed to run segmentation on chr..failed to run segmentation on chr85 ..failed to run segmentation on chr ..failed to run segmentation on chr7 9 ..failed to run segmentation on chr11..failed to run segmentation on chr..failed to run segmentation on chr10 6 ..failed to run segmentation on chr15 ..failed to run segmentation on chr13 ..failed to run segmentation on chr14 ..failed to run segmentation on chr16
..failed to run segmentation on chr17 ..failed to run segmentation on chr12 ..failed to run segmentation on chr18 ..failed to run segmentation on chr19 ..failed to run segmentation on chr20 ..failed to run segmentation on chr21 ..failed to run segmentation on chr22 ..failed to run segmentation on chrX ..failed to run segmentation on chr1 ..failed to run segmentation on chr..failed to run segmentation on chr3 ..failed to run segmentation on chr4 ..failed to run segmentation on chr6 2 ..failed to run segmentation on chr5 ..failed to run segmentation on chr7..failed to run segmentation on chr21 ..failed to run segmentation on chr12 ..failed to run segmentation on chr18 ..failed to run segmentation on chr13 ..failed to run segmentation on chr19
..failed to run segmentation on chr10 ..failed to run segmentation on chr15 ..failed to run segmentation on chr20 ..failed to run segmentation on chr11 ..failed to run segmentation on chr14 ..failed to run segmentation on chr8 ..failed to run segmentation on chr16 ..failed to run segmentation on chr9 ..failed to run segmentation on chr17 ..failed to run segmentation on chr22 ..failed to run segmentation on chrX
Do you have 'chr' prefix in your BAM files? do you see a file with "_NewCaptureRegions" created?
Yes the prefix is chr in the bam files and that file is created while controlfreec is running but then is removed on its own at the completion of controlfrec.
These are already good news. And this temporary file is not empty, I guess?
Do you also see another temporary file created "_SNPinNewCaptureRegions.bed"? Is it empty?
Or rather "_SNPinNewCaptureRegions.vcf"
Hi Valentina,
Sorry for such a late reply but thank you for being so attentive and helpful with this issue. The first file you mentioned appears and not empty but the second file "SNPinNewCaptureRegions.bed" remains empty and the pileup files created after appear empty as well.
Thank you Daniall
I also got the segementation errors again.
Also this warning as well: "***** WARNING: File /home/daniall.masood/test/00-common_all.vcf has inconsistent naming convention for record: 1 10177 rs367896724 A AC . . RS=367896724;RSPOS=10177;dbSNPBuildID=138;SSR=0;SAO=0;VP=0x050000020005170026000200;GENEINFO=DDX11L1:100287102;WGT=1;VC=DIV;R5;ASP;VLD;G5A;G5;KGPhase3;CAF=0.5747,0.4253;COMMON=1;TOPMED=0.76728147298674821,0.23271852701325178"
I am not sure if this would be effecting things as well.
Would it be possible that you could do some troubleshooting with the bam files we are using? They are publicly available. You could use a WES run of the tumor and matched normal of one of the samples since I got the same issue with WES samples as well.
https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/seqc/Somatic_Mutation_WG/data/
here are the bam files that you can use. They are matched by site prefixes and numbers as well. I used WES_NV_n_1 and WES_NV_T_1 as the matched pair before and got the same errors as WGS runs.
Thank you Daniall
FREEC calls
string command = pathToBedtools_ +" intersect -a " + makeminipileup + " -b " + bedFileWithRegionsOfInterest + " > " + intersected;
According to your config, this should result in
bedtools intersect -a /home/daniall.masood/test/00-common_all.vcf -b YOUR_PATH_TO_SNPinNewCaptureRegions.bed > YOUR_PATH_TO_SNPinNewCaptureRegions.vcf
Could you try this out, please?
Hi Valentina,
I am not sure I completely understand what I must do here. Is this a line of code that I should change somewhere? And if so where exactly is it? Sorry for the confusion on my end I just want to make sure I do it right.
Thank you Daniall
Dear Daniall, I wanted to ask you to run this command in the command line to see whether it is the problem of bedtools (called by FREEC) that the resulting file is empty. Could you do it? Or you can share these two files with me. Best wishes Valentina
Hi Valentina,
"YOUR_PATH_TO_SNPinNewCaptureRegions.bed" I am unsure what to put into here. This file does not exist and only shows up temporarily when ControlFreec is running. Once the run completes this file is not available anymore.
https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/seqc/Somatic_Mutation_WG/data/WGS/ The files are present in here. I am using WGS_NV_T_1 and WGS_NV_N_1 as the two bam files to test this run.
Thank you Daniall
Thank yo Daniall! I hope to be back to you soon!
Hi Valentina,
Thank you so much for working on this for me! I hope you are able to get results and would be very helpful for the research we are doing. You are doing a significant amount of help for us and m and my PI/mentor really appreciate it.
Thanks Daniall
Hi Valentina,
Are there any updates for this? We need to run LOH calls very soon for the paper we are publishing.
Thank you Daniall
Dear Daniall, thank you for reminding me. I have started the file download. It takes some time so I hope to have a look at the issue tomorrow.
The first thing that I will try is to run it with vcf that has a "chr" prefix: https://cloud.inf.ethz.ch/s/jdXgQWBxn2s4oQf
Sounds great! Thank you again. I will try that as well as I need a solution by Thursday/Friday this week.
Thank you Daniall
I have started it this morning, and I can already see with that this new .vcf (with chr prefixes) the minipileup is not empty. So I guess you can also try it. Here is my config (I set ploidy=2 and contaminationAdjustment = TRUE):
[general]
chrLenFile = /home/boeva/NAS_public/data/annotations/Human/hg38/hg38.22XY.fa.fai
sex=XX
maxThreads=4
#can try to comment
ploidy = 2
contaminationAdjustment = TRUE
chrFiles = /home/boeva/NAS_public/data/annotations/Human/hg38/chromosomes/
# I suggest you use the mappability file
gemMappabilityFile = /home/boeva/NAS_public/data/annotations/Human/hg38/GEM_mappability/out100m2_hg38.gem
outputDir = /home/boeva/NAS_public/tmp/debugFREEC/
[sample]
mateFile = /home/boeva/NAS_public/tmp/debugFREEC/WGS_NV_T_1.bwa.dedup.bam
inputFormat = BAM
mateOrientation = 0
#mateCopyNumberFile =
[control]
mateFile = /home/boeva/NAS_public/tmp/debugFREEC/WGS_NV_N_1.bwa.dedup.bam
inputFormat = BAM
mateOrientation = 0
#mateCopyNumberFile =
[BAF]
#instead of 00-common_all.vcf.gz
makePileup = /home/boeva/NAS_public/data/annotations/Human/hg38/dbSNP/00-common_all_with_CHR.vcf.gz
fastaFile = /home/boeva/NAS_public/data/annotations/Human/hg38/hg38.fa
SNPfile = /home/boeva/NAS_public/data/annotations/Human/hg38/dbSNP/00-common_all_with_CHR.vcf.gz
minimalCoveragePerPosition=7
Hi Daniall, our servers went to maintenance before FREEC could finish the calculations. But I think with this config and this .vcf file things should work. What I would need to do (when servers are back) to speed up the whole process is to filter the common SNP file to keep only SNP with population allele frequency above 1% - this will largely decrease the processing time.
Hi Valentina,
Thank you so much. Our server is down as well. Hopefully you get some results tomorrow and I can also try it out as well.
Thank you Daniall Masood
Hi Daniall, I have also created a dbSNP151-hg38 file with filtered SNPs (freq >0.05) to speed-up the process of creation of pileup files. You can find this VCF file here: dbSNP151.hg38-commonSNP_minFreq5Perc_with_CHR.vcf.gz
Hi Valentina,
Thank you! Please let me know as soon as you get any type of result that would be great.
Daniall, for some reason there is a problem with the smaller VCF that I have just shared. The big one works fine but takes a lot of time (https://cloud.inf.ethz.ch/s/jdXgQWBxn2s4oQf) and I get nice results (ploidy is apparently 3 for this cancer sample and not 2).
I will see now what is wrong with the small VCF.
UPD: found the error. VCF needs a header, which I accidentally removed. The new version with the header: https://cloud.inf.ethz.ch/s/idTaGpZdnS9To5c
Hi Valentina,
Yes the ploidy is actually 2.9 we found out recently since it is a hyperploidy genome. Could you share the results you have so I can see what they look like. My server is still down so I still cannot run it. But again thank you so much for all of this help. I really appreciate it.
Once I can run it myself I will come back to you and let you know how it goes.
Dear Daniall, FREEC v11.6 worked for me on this sample (with ploidy 3). It took 8 hours with 4 threads:
Here is the config I used:
[general]
chrLenFile = /home/boeva/NAS_public/data/annotations/Human/hg38/hg38.22XY.fa.fai
sex=XX
maxThreads=4
#can try to comment
ploidy = 3
contaminationAdjustment = TRUE
chrFiles = /home/boeva/NAS_public/data/annotations/Human/hg38/chromosomes/
# I suggest you use the mappability file
gemMappabilityFile = /home/boeva/NAS_public/data/annotations/Human/hg38/GEM_mappability/out100m2_hg38.gem
outputDir = /home/boeva/NAS_public/tmp/debugFREEC/
[sample]
mateFile = /home/boeva/NAS_public/tmp/debugFREEC/WGS_NV_T_1.bwa.dedup.bam
inputFormat = BAM
mateOrientation = 0
#mateCopyNumberFile =
[control]
mateFile = /home/boeva/NAS_public/tmp/debugFREEC/WGS_NV_N_1.bwa.dedup.bam
inputFormat = BAM
mateOrientation = 0
#mateCopyNumberFile =
[BAF]
makePileup = /home/boeva/NAS_public/data/annotations/Human/hg38/dbSNP/dbSNP151.hg38-commonSNP_minFreq5Perc_with_CHR.vcf.gz
fastaFile = /home/boeva/NAS_public/data/annotations/Human/hg38/hg38.fa
SNPfile = /home/boeva/NAS_public/data/annotations/Human/hg38/dbSNP/dbSNP151.hg38-commonSNP_minFreq5Perc_with_CHR.vcf.gz
minimalCoveragePerPosition=7
Hi Valentina,
Thank you so much for all of the help. It works perfectly and we are getting the neutral LOH calls that we were looking for.
However I am running into trouble for the FFPE samples that we have. For some reason, there are errors that are showing up that did not show up before when I ran ControlFreec without LOH calling on the same samples. So it runs fine without LOH calling but with the LOH parameters in I get this error.
Here is my config file for these runs. FFG.txt
These are the FFG samples: https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/seqc/Somatic_Mutation_WG/data/FFG/ If you can test them out soon and let me know where the issue is. You can use this pair as a test/troubleshoot: FFG_IL_T_24h.bwa.dedup.bam and FFG_IL_N_24h.bwa.dedup.bam
Thank you so much, Daniall
Hi Valentina,
Could you please look at this issue as sson as possible? We need this to be resolved as soon as possible.
Thank you Daniall
Dear Daniall, Are you looking for small LOH regions and this is why you use a control file? In general, on WGS, you can run FREEC without a control sample and you will get nice predictions for large LOH (as there is no germline large LOH anyway).
I cannot debug your case today as I am sick at home. However, you can try to change the first lines of your config file to:
... chrLenFile = /home/daniall.masood/reference/GRCh38/GRCh38.d1.vd1.fa.fai **#window=50000
sex=XX maxThreads=240 ... to make sure it is not a problem linked to window size or segmentation.
Or you can run FREEC with window and breakPointThreshold commented and without the control files.
Hi Valentina,
Thank you for your reply. We want to remain consistent with how we are calling LOH regions and for all of our samples we are using a matched normal bam file as our control. We are trying to call as many LOH as we can with the same parameters used for all of our files.
I will try what you have suggested just to check what the issue is, but whenever possible if you can debug it further we would really appreciate it thank you.
Thanks Daniall
Hi Valentina,
We figured out what was going wrong with this dataset. The bam files were not sorted and had to be sorted before running. We have gotten LOH calls to work on all of the datasets and have them running perfectly with no issues.
Thank you for all your help and dedication to this issue.
Daniall
Hi,
I have been able to use ConotrlFreec successfully to call somatic CNVs from matched normal/tumor files. However I have been trying to do LOH calls as well and have not been finding success in being able to call the status/ the last two columns in the result file (which would give status and percentage). I will paste the config file that I use below and if you have any suggestions that would be great thank you.
[general]
chrLenFile = /home/daniall.masood/reference/GRCh38/GRCh38.d1.vd1.fa.fai window=50000 breakPointThreshold = 0.04 sex=XX maxThreads=40
ploidy = 2
chrFiles = /home/daniall.masood/GRCh38
telocentromeric = 50000
gemMappabilityFile = /home/daniall.masood/out100m2_hg38.gem
outputDir = /home/daniall.masood/test/WGS_LOH/
coefficientOfVariation = 0.062
breakPointThreshold = -.002;
window = 50000
chrFiles = /home/daniall.masood/GRCh38
outputDir = test
degree=3
intercept = 0
[sample]
mateFile = /home/daniall.masood/WGS/WGS_NV_T_1.bwa.dedup.bwa.dedup.bam inputFormat = BAM mateOrientation = 0
mateCopyNumberFile = HCC1143.arachne_sample.cpn
[control]
mateFile = /home/daniall.masood/WGS/WGS_NV_N_1.bwa.dedup.bwa.dedup.bam inputFormat = BAM mateOrientation = 0
mateCopyNumberFile = HCC1143_BL.arachne_control.cpn
[BAF]
makePileup = /home/daniall.masood/test/hg19_snp142.SingleDiNucl.1based.txt.gz fastaFile = /home/daniall.masood/reference/GRCh38/GRCh38.d1.vd1.fa SNPfile = /home/daniall.masood/test/hg19_snp142.SingleDiNucl.1based.txt.gz
result file (example):
1 50000 100000 1 loss A -1 1 100000 350000 3 gain - -1 1 500000 550000 3 gain - -1 1 2700000 2800000 3 gain - -1 1 4250000 5700000 3 gain - -1 1 9350000 9400000 1 loss A -1 1 9500000 16750000 3 gain - -1 1 16800000 36650000 3 gain - -1 1 39000000 40750000 3 gain - -1 1 40750000 40800000 4 gain - -1 1 40800000 40950000 5 gain - -1 1 40950000 51550000 3 gain - -1 1 51550000 51700000 4 gain - -1 1 51700000 52100000 3 gain - -1 1 58100000 58550000 1 loss A -1 1 69600000 72300000 1 loss A -1 1 72300000 72350000 0 loss - -1 1 72350000 76550000 1 loss A -1 1 84450000 89600000 5 gain - -1 1 89600000 94800000 4 gain - -1 1 94800000 94850000 3 gain - -1 1 94850000 96000000 4 gain - -1 1 117650000 117700000 3 gain - -1 1 117700000 118350000 4 gain - -1 1 118350000 120000000 5 gain - -1 1 120000000 120150000 4 gain