Indices not matching original data.

@mnijhuis-dnb Thank you so much for this library. I'm very new to Python and this is one of my first projects. Your library was extremely clear, useful and easy to understand and follow. It's great work so I just wanted to mention that first.

I've spent the better part of a month building a database/matching process. I'm attempting to match a list of names from a database table called company_directory against names in an Excel file (which are imported via a custom method). Everything seems to be working correctly, and the match names appear to be the right matches, however, the index is always off from the original data. I can't seem to find any consistency with why that's happening (ie it's not off by a certain number in every instance).

I'm not sure if this is a known error or something wrong I'm doing on my end, but this is the absolute last piece of the puzzle for me, so if I can figure this out, it'll essentially complete my project. Any assistance would go such a long way. I'd be happy to pay hourly to set up a screenshare to walk through it as well if that's preferred as I dont want to take advantage of anyone's time.

Thank you so much!!

def match_names_to_db(fileloc, user, pw, host, db):
    # pull the names to be matched from database
    db_pull = DatabaseUpdater(user=user, pw=pw, host=host, db=db)
    database_names = db_pull.fetch_columns_from_table(table_name='company_directory', column_names=['id', 'company_name'])
    database_names.set_index('id', inplace=True)
    # get names to be matched from Excel file
    tracker_names = data_frame_from_xlsx_range(fileloc, 'tracker_names_to_match')
    tracker_names_unchanged = tracker_names.copy(deep=True)

    # initialize and run name matcher
    matcher = NameMatcher(top_n=50, lowercase=True, punctuations=True, remove_ascii=True, legal_suffixes=True,
                          common_words=True, number_of_matches=5)

    matcher.set_distance_metrics(['overlap',
                                  'weighted_jaccard',
                                  'ratcliff_obershelp',
                                  'fuzzy_wuzzy_token_sort',
                                  'editex',
                                  'discounted_levenshtein'])

    matcher.load_and_process_master_data('company_name', database_names, transform=True)
    matches = matcher.match_names(to_be_matched=tracker_names, column_matching='Tracker_Name')

    # sort the database returned by NameMatcher
    matches.to_excel('test_with_db_pull1.xlsx')

DeNederlandscheBank / name_matching

Indices not matching original data. #15