Closed davidyuyuan closed 3 months ago
Hello,
I’m on FTO currently, so I won’t be able to look in depth for a little while. However, a couple things come to mind immediately:
Matt
Hi Matt,
Thank you for the prompt response.
Here are the answers to both of your questions:
According to ENA, HG00436 in the G1K dataset was sequenced on Illumina NovaSeq 6000: https://www.ebi.ac.uk/ena/browser/view/ERR3241673. It is not PacBio HiFi data.
The CRAM was aligned to GRCh38. It is one of the “90 Han Chinese high coverage genomes”, a subset of "30x GRCh38". You can find its metadata under https://www.internationalgenome.org/data-portal/sample.
Kind regards,
David Yu Yuan
Yep, that makes sense. StarPhase only support HiFi datasets, so Illumina short reads will not work.
I used
/opt/conda/bin/pbstarphase diplotype -v --normalize-d6-only --bam "${output_dir}/${file}" -d "${CPIC_JSON_DB}" -t 4 -o "${output_dir}/${file}.json" --pharmcat-tsv "${output_dir}/pharmcat.tsv" -r "${ref_genome}"
to call diplotype ofCYP2D6
in a CRAM from G1K (ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR324/ERR3241673/HG00436.final.cram). With visual inspection in the EnsEMBL genome browser, the CRAM has 30x - 47x even coverage through the gene region. The utility reported no reads detected. Here is a snippet of the log message:I'd appreciate if you could take a look at this. Please let me know if I used the utitliy incorrectly.