HallLab / pandas-genomics

Pandas ExtensionDtypes for dealing with genomics data
BSD 3-Clause "New" or "Revised" License
47 stars 8 forks source link

Variant ID considerations #16

Closed jrm5100 closed 3 years ago

jrm5100 commented 3 years ago

The canonical way to identify variants should be consistent, either:

Choice 1 would render the variant ID useless when it exists in a dataframe. Choice 2 would require more careful validation of dataframes to avoid duplicate IDs

Either choice requires carefully considering how to name encoded genotype results.

jrm5100 commented 3 years ago

Some updates: