look for index column that ascends numerically, either starting at 0 or 1
look for column names that can be attributed to bioregistry prefixes (do a minimal amount of preprocessing, like removesuffix("_id"), etc.) and potentially do regular expression checking (or sampled regular expression checking)
given a datafame - identify what columns are