Closed Pharmacogenetecist closed 2 years ago
There seems to be something wrong with a line in your VCF where the CHROM column is HLA-DQA1*05:03. Could you please share your file or an example?
What do you see when you type bgzip -h
in your computing environment?
HLA-DQA1*05:03
The file is extremely large to be able to share, I will try subset where the error is coming from.
For bgzip -h bgzip -h
Version: 1.9 Usage: bgzip [OPTIONS] [FILE] ... Options: -b, --offset INT decompress at virtual file pointer (0-based uncompressed offset) -c, --stdout write on standard output, keep original files unchanged -d, --decompress decompress -f, --force overwrite files without asking -h, --help give this help -i, --index compress and create BGZF index -I, --index-name FILE name of BGZF index file [file.gz.gzi] -l, --compress-level INT Compression level to use when compressing; 0 to 9, or -1 for default [-1] -r, --reindex (re)index compressed file -g, --rebgzip use an index file to bgzip a file -s, --size INT decompress INT bytes (uncompressed size) -@, --threads INT number of compression threads to use [1] -t, --test test integrity of compressed file
@Pharmacogenetecist Just a single line will do, but assuming the error is accurate, it looks like you're providing HLA-DQA1*05:03
as a value for the chromosome value in the VCF, which is invalid...
HLA-DQA1*05:03
Searching for "HLA-DQA" in the vcf I get the following - attached.
I am guessing this may be a result of the DRAGEN pipeline we use which assesses HLA. I was hoping using the pre-processing tool would "fix" this, but seems like I have some issues there as well.
Appreciate your help.
We definitely want to track down the problem you're having with the preprocessor, and we'll be getting back to you on that ASAP.
However, even the preprocessor won't be able to work with this.
I'm not familiar enough with DRAGEN to suggest anything concrete, but I would hazard a guess that there is some configuration somewhere then will get it to provide proper chromosomal locations.
One of the requirements for PharmCAT is that the VCF is based on GRCh38. Without a common understanding of what your data represents, there's no way for PharmCAT to make any calls.
We definitely want to track down the problem you're having with the preprocessor, and we'll be getting back to you on that ASAP.
However, even the preprocessor won't be able to work with this.
I'm not familiar enough with DRAGEN to suggest anything concrete, but I would hazard a guess that there is some configuration somewhere then will get it to provide proper chromosomal locations.
One of the requirements for PharmCAT is that the VCF is based on GRCh38. Without a common understanding of what your data represents, there's no way for PharmCAT to make any calls.
Thank you, I appreciate the help.
I re-tried a different sample as the above was a gvcf file. With a standard vcf file I get the following error.
java -jar "/user/PharmCAT-2.0.0/pharmcat-2.0.0-all.jar" -vcf "/user/DATA/test38.vcf" -o "/home/user/DATA/" java.lang.IllegalArgumentException: CHROM column "HLA-DRB1*03:01:01:01" contains whitespace or colons
at org.pharmgkb.parser.vcf.model.VcfPosition.
I suppose if I tried to subset my VCF files for chr 1 - 22, x, y and MT, I could see if that helps.
I tried the new version of the pre-processor this morning and get the following: Please use bcftools 1.16 or higher bcftools --version bcftools 1.9 Using htslib 1.9
@Pharmacogenetecist bcftools 1.9 is a much lower version (seven versions older) than 1.16. You can download the latest tools here http://www.htslib.org/download/ or load an up-to-date bcftools in your computing environment if it's already installed.
@Pharmacogenetecist bcftools 1.9 is a much lower version (seven versions older) than 1.16. You can download the latest tools here http://www.htslib.org/download/ or load an up-to-date bcftools in your computing environment if it's already installed.
Thank you, I guess I need to go back to grade school to learn my numbers ;)
I am getting an issue on updating bcftools to 1.16 bcftools: error while loading shared libraries: libgsl.so.25: cannot open shared object file: No such file or directory
I tried making a new conda environment and installing - bcftools=1.16 and - htslib=1.16 but when I run bcftools --version, I get the above error
RESOLVED error by installing - https://anaconda.org/conda-forge/gsl
Will test pre-processor and report back.
Update, I tried 3 different gzipped vcf files.
errors:
I tool the original vcf file 1 and bgzipped it instead of gzip - this time I get an error similar to vcf files 2 and 3 below: Normalizing VCF [E::faidx_adjust_position] The sequence "chrX" was not found faidx_fetch_seq failed at chrX:154531869
vcf file 2 Preparing the reference PGx VCF [E::faidx_adjust_position] The sequence "chrX" was not found faidx_fetch_seq failed at chrX:154532046
vcf file 3 Normalizing VCF [E::faidx_adjust_position] The sequence "chrX" was not found faidx_fetch_seq failed at chrX:154532293
I think we're going to need more information here. Can you provide the VCF headers and the lines on chrX:154532293
?
I just released a v2.0.2. Could you please re-try with this?
If you've already donwloaded the reference fasta files, you'll need to delete them (look for reference.fna.*
). We needed to fix something in them and you'll need to download the updated versions (the VCF preprocessor does this automatically).
@markwoon @BinglanLi I really appreciate all your help. Headers etc, for some reason I can't ssh into our server today, I will test this out and get back to you as soon as I can.
I just released a v2.0.2. Could you please re-try with this?
If you've already donwloaded the reference fasta files, you'll need to delete them (look for
reference.fna.*
). We needed to fix something in them and you'll need to download the updated versions (the VCF preprocessor does this automatically).
Heads:
VCF file 1
For VCF file 2:
I removed 2.0.1 and got the 2.0.2. For both files I still get the same chrX error using 2.0.2
FYI - the test VCF file I am using (VCF2) is from here: https://ftp-trace.ncbi.nih.gov/giab/ftp/release/NA12878_HG001/latest/GRCh38/
Just to confirm, did you remove the previously downloaded reference.fna.*
before rerunning the VCF preprocessor v2.0.2? I replicated the error that you had. The warning went away after I removed the previously downloaded reference genome sequence and used the new VCF preprocessor.
Just to confirm, did you remove the previously downloaded
reference.fna.*
before rerunning the VCF preprocessor v2.0.2? I replicated the error that you had. The warning went away after I removed the previously downloaded reference genome sequence and used the new VCF preprocessor.
I just retried it after deleting the reference file and get the following error.
Traceback (most recent call last):
File "/Pharmcatpreprocessor2.0.2/PharmCAT_VCF_Preprocess.py", line 245, in
We utilize the Illumina Dragen pipeline for processing our WGS. I have noticed the following error when running PHARMCAT 2.0 and prior versions.
java.lang.IllegalArgumentException: CHROM column "HLA-DQA1*05:03" contains whitespace or colons at org.pharmgkb.parser.vcf.model.VcfPosition.(VcfPosition.java:66)
at org.pharmgkb.parser.vcf.VcfParser.parseNextLine(VcfParser.java:238)
at org.pharmgkb.parser.vcf.VcfParser.parse(VcfParser.java:127)
at org.pharmgkb.pharmcat.haplotype.VcfReader.read(VcfReader.java:138)
at org.pharmgkb.pharmcat.haplotype.VcfReader.(VcfReader.java:65)
at org.pharmgkb.pharmcat.haplotype.NamedAlleleMatcher.buildVcfReader(NamedAlleleMatcher.java:161)
at org.pharmgkb.pharmcat.haplotype.NamedAlleleMatcher.call(NamedAlleleMatcher.java:212)
at org.pharmgkb.pharmcat.PharmCAT.execute(PharmCAT.java:322)
at org.pharmgkb.pharmcat.PharmCAT.main(PharmCAT.java:194)
I was hoping you could explain this error and if there was an easy fix my end.
I tried to preprocess the VCF, followed the instruction to get all the pre-requistes, but get the following error;
python3 PharmCAT_VCF_Preprocess.py -vcf /pharmcatDATAin/NA12878.gvcf.gz -o /pharmcatDATAin/PHARMCAT_Preprocessed/ Traceback (most recent call last): File PharmCAT_VCF_Preprocess.py, line 227, in
run(args)
File PharmCAT_VCF_Preprocess.py, line 44, in run
tool_version = re.search(r'Version: (\d.\d+)*', version_message).group(1)
AttributeError: 'NoneType' object has no attribute 'group'