genevol-usp / hlaseqlib

utility functions for the HLA RNAseq project
Other
4 stars 0 forks source link

generate complete genomic HLAs from IMGT #1

Open boyangzhao opened 3 years ago

boyangzhao commented 3 years ago

Hi - I stumble upon your tool HLApers and the associated library hlaseqlib. I find the method hla_read_alignment quite interesting as you know, the IMGT database does not contain the complete sequences. I was wondering if this method can be used for processing genomic sequences (the _gen files)? I see it mentions cds and don't know how hard-coded it is toward cds sequences only? I do see hla_read_alignment has the ability to process either nuc or gen files.

VitorAguiar commented 3 years ago

Hi, yes the hla_read_alignment function should process the gen files. The "cds" term is a bad naming convention which comes from the times when I only used the nuc files. Thanks!

boyangzhao commented 3 years ago

Thanks! In addition to hla_read_alignment, which processes the gen files, does the methods hla_compile_index and script make_index_files.R work on gen files? I presume these are more customized toward nuc files? I'm also interested to generate a complete nuc fasta, by fill-ins with the closest alleles at the genomic dna level.

VitorAguiar commented 3 years ago

Correct, those are intended to be used with nuc files. In principle, you could tweak the code of hla_compile_index so it reads gen files with hla_read_alignment, but that was never tested, and I cannot foresee the possible problems. If you want to use gen files to compute distances across alleles, you probably will need to better model large insertions and deletions in the introns.

On Oct 14, 2021, at 12:38 PM, Boyang Zhao @.***> wrote:

Thanks! In addition to hla_read_alignment, which processes the gen files, does the methods hla_compile_index and script make_index_files.R work on gen files? I presume these are more customized toward nuc files? I'm also interested to generate a complete nuc fasta, by fill-ins with the closest alleles at the genomic dna level.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/genevol-usp/hlaseqlib/issues/1#issuecomment-943528342, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA5SFU3YNQZSC3XJIGAEJRTUG4BPFANCNFSM5FAVTMHQ. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.