Open anopperl opened 7 years ago
@anopperl Could you post the code you are trying to run that caused the error shown above? Might help in coming up with a solution. Thanks.
There is already a genes.refGene in the directory pyhgvs/data of this repository. It is simply old but working.
"genes.refGene" is not directly available at UCSC. However, you should be able to created one with latest tables from UCSC database. Refer #26 for an example for hg19. For hg38, you simply need to replace hg19 by hg38 in the example command. (Or download the refGene.txt.gz from ftp://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/refGene.txt.gz)
I used the following command to get the "genes.refGene" file, run the Example usage, still still have the above error 。 mysql -ugenomep --password=password -hgenome-mysql.cse.ucsc.edu -ACD hg19 -BNe "SELECT r.bin,CONCAT(r.name, '.', g.version) AS name,r.chrom,r.strand,r.txStart,r.txEnd,r.cdsStart,r.cdsEnd,r.exonCount,r.exonStarts,r.exonEnds,r.score,r.name2,r.cdsStartStat,r.cdsEndStat,r.exonFrames FROM hg19.refGene r, hgFixed.gbCdnaInfo g WHERE r.name = g.acc" > genes.refGene
This is a dupe, see other issue for scripts: https://github.com/counsyl/hgvs/issues/26#issuecomment-523477167
I've made a Python package that provides ~800k transcripts (both RefSeq and Ensembl) for PyHGVS
You can either download a JSON.gz file, or use a REST service. To use it:
from cdot.pyhgvs.pyhgvs_transcript import JSONPyHGVSTranscriptFactory, RESTPyHGVSTranscriptFactory
factory = RESTPyHGVSTranscriptFactory()
# factory = JSONPyHGVSTranscriptFactory(["./cdot-0.2.1.refseq.grch38.json.gz"]) # Uses local JSON file
pyhgvs.parse_hgvs_name(hgvs_c, genome, get_transcript=factory.get_transcript_grch37)
I've made a Python package that provides ~800k transcripts (both RefSeq and Ensembl) for PyHGVS
You can either download a JSON.gz file, or use a REST service. To use it:
from cdot.pyhgvs.pyhgvs_transcript import JSONPyHGVSTranscriptFactory, RESTPyHGVSTranscriptFactory factory = RESTPyHGVSTranscriptFactory() # factory = JSONPyHGVSTranscriptFactory(["./cdot-0.2.1.refseq.grch38.json.gz"]) # Uses local JSON file pyhgvs.parse_hgvs_name(hgvs_c, genome, get_transcript=factory.get_transcript_grch37)
Hi,@davmlaw, I want to get HGVS cdot from REF/ALT using pyhgvs(not parse hgvs string), how could I use the cdot package to help me? Could you show me some example of scripts? Thanks
First, install cdot:
python3 -m pip install cdot
Then get the cdot data:
wget https://github.com/SACGF/cdot/releases/download/v0.2.12/cdot-0.2.12.refseq.grch37_grch38.json.gz
This is based off the pyhgvs README - basically we need to load cdot data files to populate the pyhgvs "transcript" record (they wrote their own get_transcript() method in the README)
import pyhgvs as hgvs
from pysam.libcfaidx import FastaFile
from cdot.pyhgvs.pyhgvs_transcript import JSONPyHGVSTranscriptFactory
factory = JSONPyHGVSTranscriptFactory(["./cdot-0.2.12.refseq.grch37_grch38.json.gz"])
transcript = factory.get_transcript_grch37("NM_000352.3") # change to 38 if you want that build
chrom, offset, ref, alt = ('chr11', 17496508, 'T', 'C')
genome = FastaFile("/data/annotation/fasta/GCF_000001405.25_GRCh37.p13_genomic.fna.gz")
hgvs_name = hgvs.format_hgvs_name(
chrom, offset, ref, alt, genome, transcript)
@davmlaw Thank you very much! And I wonder is there any way to get all the relevant transcripts based on chrom and offset?
The easiest way to do that is probably using bedtools on a GTF then pulling out transcripts from the cut down GTF
Another way would be to use another Python library for doing genomic type stuff that I wrote called pyreference untested code would be:
my_region_iv = GenomicInterval("chr1", 100, 10000)
for iv, transcript_set in reference.genomic_transcripts[my_region_iv]
pass
@davmlaw Hi, your work is great! I used your package and pyhgvs to generate HGVS name. Then, I used another package: hgvs to parse the variant, but it raised HGVSParserError:raise HGVSParseError("{s}: char {exc.position}: {reason}".format( hgvs.exceptions.HGVSParseError: NM_001286350.2(DACT2):c.681_692dup12: char 35: expected EOF
. Why hgvs can not parse the result which generated by pyhgvs? How could I solve it?
Hi, HGVS is very broad spec and each library only parses a subset of it.
I think the issue is the 12 at the end (this is redundant, and you could work it out from the difference between positions) you need to regenerate them without the numbers or strip them off
how to create or find "genes.refGene" file for hg19, hg38. i have got "genes.refGene" file from USSC but these are not working for my case
error shows :
Traceback (most recent call last): File "first_py.py", line 38, in
hgvs_name, genome, get_transcript=get_transcript)
File "build/bdist.linux-x86_64/egg/pyhgvs/init.py", line 1356, in parse_hgvs_name
ValueError: transcript is required