Closed jh2663 closed 4 years ago
UPDATE: When I give a try with hg38 version, I find it seems working with following Warning message only:
Warning message: In add_ID_cats_ToVCF(vcf = vcf, ref_fasta = ref_fasta, palimpdir_man = palimpdir, : Indel category extraction with PCAWG7-data-preparation-version-1.5 python script is finished (if there are error messages above it has not been successful)
So it should be 'hg19' version issue?
thanks.
UPDATE2:
OK. I've figured out what happened. Actually my data was based on 'hs37d5' genome version not precisely 'hg19'. So I downloaded 'BSgenome.Hsapiens.1000genomes.hs37d5' and gave a try this:
================================================================= mut_data_vcf<-annotate_VCF(vcf = mut_data_selected.ordered, genome_build = "hg19", ref_genome = ref_genome, ref_fasta = ref_fasta, palimpdir= palimpdir, add_strand_and_SBS_cats = FALSE, add_ID_cats=TRUE, add_DBS_cats=FALSE)
[1] Runnng PCAWG7-data-preparation-version-1.5 in python to extract Indel categories.. Warning message: In add_ID_cats_ToVCF(vcf = vcf, ref_fasta = ref_fasta, palimpdir_man = palimpdir, : Indel category extraction with PCAWG7-data-preparation-version-1.5 python script is finished (if there are error messages above it has not been successful)
================================================================= Now it seems worked. Actually, 'hg19' and 'hs38d5' differ in mitochondrial DNA coordinates but the option 'genome_build' allows only either 'hg19' or 'hg38'. This is a bit confusing.
Please let me know if anything else should be considered.
best,
UPDATE3:
sorry for long posting. I got another error using with hs37d5 version. Now SBS annotation is not working possibly because hg19 version has 'chr' tag but not in hs37d5.
================================================================= [1] Adding gene, strand and SBS category annotations .. Error in .getOneSeqFromBSgenomeMultipleSequences(x, name, start, NA, width, : sequence chr1 not found In addition: Warning message: In .Seqinfo.mergexy(x, y) :
Error in .getOneSeqFromBSgenomeMultipleSequences(x, name, start, NA, width, : sequence chr1 not found
===================================================================
Finally, I was able to resolve all of above issues simply by using 'BSgenome.Hsapiens.UCSC.hg19' as 'ref_genome' but 'hs37d5.fasta' as 'ref_fasta'. Could you fix this genome version related issue in the next update?
best,
Hi,
Thank you very much for pointing out this issue, we'll look into it and fix it for the next update!
best wishes. Benedict
Hi, just another issue with handling indel categories.
I have a FASTA file locally (out_file) and run this command to get ID categories:
palimpdir<-'/Users/hongj/Library/R/3.6/library/Palimpsest/' mut_data_vcf<-annotate_VCF(vcf = mut_data_selected.ordered, genome_build = "hg19", ref_genome = ref_genome, ref_fasta = out_file, palimpdir= palimpdir, add_ID_cats=TRUE, add_DBS_cats=FALSE)
SBS annotation works perfectly but not for ID. It shows following errors:
================================================================== [1] Runnng PCAWG7-data-preparation-version-1.5 in python to extract Indel categories.. ('failed while annotating', '/Users/hongj/Library/R/3.6/library/Palimpsest/Temporary//python_vcf_indel.simple') Traceback (most recent call last): File "/Users/hongj/Library/R/3.6/library/Palimpsest/exec/make_spectra_indels.py", line 123, in
file_features_plus, file_counts_plus = lookups_indels.get_indel_class_counts(filename, input_type, '+', variant_filters=filters)
File "/Users/hongj/Library/R/3.6/library/Palimpsest/exec/lookups_indels.py", line 757, in get_indel_class_counts
annotate_simple_indel(input_filename, annotated_cache_file)
File "/Users/hongj/Library/R/3.6/library/Palimpsest/exec/lookups_indels.py", line 328, in annotate_simple_indel
LHS_seq = get_reference_seq(ref_genome, test_chromosome, start_pos-5*del_length, start_pos-1)
File "/Users/hongj/Library/R/3.6/library/Palimpsest/exec/lookups_indels.py", line 85, in get_reference_seq
sequence = _grch37_reference.get_sequence(chromosome, start, end)
AttributeError: 'NoneType' object has no attribute 'get_sequence'
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
no lines available in input
In addition: Warning message:
In add_ID_cats_ToVCF(vcf = vcf, ref_fasta = ref_fasta, palimpdir_man = palimpdir, :
Error in read.table(file = file, header = header, sep = sep, quote = quote, : no lines available in input
================================================================== Any idea on this?
best,