FunGeST / Palimpsest

An R package for studying mutational signatures and structural variant signatures along clonal evolution in cancer.
69 stars 19 forks source link

annotate_VCF error: extract Indel categories for hg38 #29

Closed iS4i4S closed 4 years ago

iS4i4S commented 4 years ago

Hi, i'm getting an error (seems to be failing looking for grch37, when not present because function is called for hg38) using the following function:

vcf_PCNSL_palim_v2<-annotate_VCF(vcf=maf_PCNSL_noFlags2_p, genome_build="hg38",ref_genome =BSgenome.Hsapiens.UCSC.hg38, ref_fasta="/media/isaias.hernandez/Seagate1/ISAIAS/IcGEX_WES_8DLBCL+8PCNSL/GATK/Reference/GRCh38.d1.vd1.fa" , palimpdir = "/home/isaias.hernandez/Palimpsest", add_ID_cats=T)

('failed while annotating', '/home/isaias.hernandez/Palimpsest/Temporary//python_vcf_indel.simple') Traceback (most recent call last): File "/home/isaias.hernandez/Palimpsest/exec/make_spectra_indels.py", line 123, in file_features_plus, file_counts_plus = lookups_indels.get_indel_class_counts(filename, input_type, '+', variant_filters=filters) File "/home/isaias.hernandez/Palimpsest/exec/lookups_indels.py", line 757, in get_indel_class_counts annotate_simple_indel(input_filename, annotated_cache_file) File "/home/isaias.hernandez/Palimpsest/exec/lookups_indels.py", line 328, in annotate_simple_indel LHS_seq = get_reference_seq(ref_genome, test_chromosome, start_pos-5*del_length, start_pos-1) File "/home/isaias.hernandez/Palimpsest/exec/lookups_indels.py", line 85, in get_reference_seq sequence = _grch37_reference.get_sequence(chromosome, start, end) AttributeError: 'NoneType' object has no attribute 'get_sequence' Error in read.table(file = file, header = header, sep = sep, quote = quote, : no lines available in input De plus : Warning messages: 1: In palimpsest_addMutationContextToVR(vr = vr, ref = ref_genome, : References do not match in 14 cases 2: In palimpsest_addMutationContextToVR(vr = vr, ref = ref_genome, : References do not match in 14 cases 3: In add_ID_cats_ToVCF(vcf = vcf, ref_fasta = ref_fasta, palimpdir_man = palimpdir) : Indel category extraction with PCAWG7-data-preparation-version-1.5 python script is finished (if there are error messages above it has not been successful)

FunGeST commented 4 years ago

Hi,

Thanks for pointing this out. We found the error (GrCh37 is hard coded somewhere in the Python script) and we will push the corrected function in the next few days. In the meantime you can still analyze SBS and DBS signatures only by setting the 'add_ID_cats' argument to FALSE. Thank you for your patience.

Eric

FunGeST commented 4 years ago

Hi,

We've removed the GrCh37 hard-coding, so the function should work for you now. Please let us know if it doesn't, or if you encounter any other issues!

Benedict

iS4i4S commented 4 years ago

Hi, Thanks for fixing it, it works perfectly now. I will let you know if I find something else.