cgab-ncc / FIREVAT

FInding REliable Variants without ArTifacts
MIT License
21 stars 8 forks source link

different reference genome #13

Open wnddl111 opened 1 year ago

wnddl111 commented 1 year ago

hello

my reference genome is Ensemble v41 hg38. but your reference genome is ucsc hg38.

so i changed chromosome annotation according to ucsc chromosome version in my vcf file (ex) GL383518.1 -> chr1_GL383518v1_alt

chr1_GL383518v1_alt | 182,439 | GL383518.1 -- | -- | --

but i dont know what to do to solve this error

my code: results <- RunFIREVAT(vcf.file = sample.vcf.file, vcf.file.genome = 'hg38', # for mouse variants: 'mm10' config.file = mutect2.config.file, df.ref.mut.sigs = GetPCAWGMutSigs(), target.mut.sigs = GetPCAWGMutSigsNames(), sequencing.artifact.mut.sigs = PCAWG.All.Sequencing.Artifact.Signatures, output.dir = output.dir, objective.fn = Default.Obj.Fn, num.cores = 2, ga.pop.size = 100, ga.max.iter = 5, ga.run = 5, perform.strand.bias.analysis = TRUE, ref.forward.strand.var = "TumorDPRefForward", ref.reverse.strand.var = "TumorDPRefReverse", alt.forward.strand.var = "TumorDPAltForward", alt.reverse.strand.var = "TumorDPAltReverse", annotate = FALSE)

error: Error in .getOneSeqFromBSgenomeMultipleSequences(x, names[i], start[i], : sequence chr not found In addition: Warning messages: 1: In file.remove(paste0(output.dir, existing.firevat.optimization.log.tsv.file)) : cannot remove file 'C:/Users/User/Desktop/', reason 'Permission denied' 2: In scan(text = x, what = "character", quiet = TRUE, sep = split.char) : EOF within quoted string 3: In scan(text = x, what = "character", quiet = TRUE, sep = split.char) : EOF within quoted string 4: In rbind(c(ID = "FAIL", Description = "Fail the site if all alleles fail but for different reasons." : number of columns of result is not a multiple of vector length (arg 1) 5: In rbind(c(ID = "AD", Number = "R", Type = "Integer", Description = "Allelic depths for the ref and alt alleles in the order listed" : number of columns of result is not a multiple of vector length (arg 1) 6: In rbind(c(ID = "AS_FilterStatus", Number = "A", Type = "String", : number of columns of result is not a multiple of vector length (arg 2)

khb7840 commented 1 year ago

If the vcf & BSgenome chromosome names match, try running RunFIREVAT with check.chromosome.name = FALSE

wnddl111 commented 1 year ago

@khb7840 Thank you very much for your reply. I run FIREVAT again after matching the chromosome name, but a new error occurred in the starnd bias anaylsis part.

[1] "INFO [2023-04-26 10:38:42] Step 02-4. Filter VCF based on optmized filter parameters." [1] "INFO [2023-04-26 10:38:42] Before applying filter: 6662 rows in VCF object" [1] "INFO [2023-04-26 10:38:42] After applying filter: " [1] "INFO [2023-04-26 10:38:42] 6532 rows in vcf.data.filtered VCF object" [1] "INFO [2023-04-26 10:38:42] 130 rows in vcf.data.artifact VCF object" [1] "INFO [2023-04-26 10:38:42] Step 03. Additional analysis." [1] "INFO [2023-04-26 10:38:42] Step 03-1. Perform strand bias analysis [firevat_strand_bias::PerformStrandBiasAnalysis]" Error in fisher.test(test.mat) : All entries in "x" must be nonnegative and finite In addition: Warning messages: 1: In file.remove(paste0(output.dir, existing.firevat.optimization.log.tsv.file)) : cannot remove file 'C:/Users/User/Desktop/firevat', reason 'Permission denied' 2: In scan(text = x, what = "character", quiet = TRUE, sep = split.char) : EOF within quoted string 3: In scan(text = x, what = "character", quiet = TRUE, sep = split.char) : EOF within quoted string 4: In rbind(c(ID = "FAIL", Description = "Fail the site if all alleles fail but for different reasons." : number of columns of result is not a multiple of vector length (arg 1) 5: In rbind(c(ID = "AD", Number = "R", Type = "Integer", Description = "Allelic depths for the ref and alt alleles in the order listed" : number of columns of result is not a multiple of vector length (arg 1) 6: In rbind(c(ID = "AS_FilterStatus", Number = "A", Type = "String", : number of columns of result is not a multiple of vector length (arg 2)

khb7840 commented 1 year ago

Strand bias analysis requires strand specific allele supporting read count in the input. If they are given in the input, they should be read correctly through config file. Otherwise, you may skip strand bias analysis by applying perform.strand.bias.analysis = FALSE, filter.by.strand.bias.analysis = FALSE to RunFIREVAT.

wnddl111 commented 1 year ago

Thank you very much for your reply. I run FIREVAT again after setting the strand.bias.analysis = FALSE, but a new error occurred in drawing plot section

[1] "INFO [2023-04-27 01:04:36] Step 05. Generate FIREVAT report" [1] "INFO [2023-04-27 01:04:36] * Started generating FIREVAT report" [1] "INFO [2023-04-27 01:04:36] ** Started plotting optimization iterations" Error in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : polygon edge not found

Thank you so much for your quick reply!!!

khb7840 commented 1 year ago

I'm not sure but it can be a font-related problem (stack-overflow link)