biopragmatics / bioregistry

📮 An integrative registry of biological databases, ontologies, and nomenclatures.
https://bioregistry.io
MIT License
120 stars 53 forks source link

Data science tool idea #912

Open cthoyt opened 1 year ago

cthoyt commented 1 year ago

given a datafame - identify what columns are

  1. look for index column that ascends numerically, either starting at 0 or 1
  2. look for column names that can be attributed to bioregistry prefixes (do a minimal amount of preprocessing, like removesuffix("_id"), etc.) and potentially do regular expression checking (or sampled regular expression checking)
  3. automate mapping?