bigbio / py-pgatk

Python tools for proteogenomics analysis toolkit
Apache License 2.0
10 stars 11 forks source link

Alt ORFs from protein sequences #3

Closed husensofteng closed 5 years ago

husensofteng commented 5 years ago

Extract an alternative open reading frame for a given transcript.

Input: string: Transcript ID file: Canonical proteins fasta file: GTF file: Genome DNA fasta

Output: str: Proteins sequence containing the translated transcript sequence str: record ID

husensofteng commented 5 years ago

Done Example command: python pypgatk_cli.py dnaseq-to-proteindb --config_file config/ensembl_config.yaml --input_fasta testdata/test.fa --output_proteindb testdata/proteindb_from_altORFs_DNAseq.fa --include_biotypes altORFs --skip_including_all_cds