daffidwilde / matching

A package for solving matching games
https://daffidwilde.github.io/matching/
MIT License
149 stars 42 forks source link

HR matching with no hospital preferences #167

Closed agordon closed 11 months ago

agordon commented 11 months ago

Hello,

I have a use-case for HR-matching (students and elective courses, instead of residents and hospitals, respectively), except that the "hospitals" (=courses in my case) do not have any preference for one student over another.

Using https://daffidwilde.github.io/matching/docs/tutorials/hospital_resident.html as an basis for my question:

If provide an empty hospital_preferences dictionary (where each hospital key is associated with an empty list = no preferred residents) - the code fails (with ValueError: 199 is not in list , i.e. the code unconditionally assumed resident ID 199 is in the preference list of the hospital).

So instead, Would it be correct to say, that if I artificially provide identical preference list for all hospitals (i.e. all residents, in exactly the same order, for all hospitals) - than effectively it negates any hospital preference for specific students?

Code-wise (compared to the HR example in the docs), I replaced the following line:

hospital_preferences = read_yaml_to_dict("hospitals.yml")

with:

all_residents = list(resident_preferences.keys())
hospital_preferences = dict([ (k, all_residents) for k in hospital_capacities.keys() ])

So every hospital has the exact same (artificial) preference list containing all residents.

It seems to me, from cursory testing, that this is valid - but I would welcome any feedback.

Thanks! gordon

daffidwilde commented 11 months ago

Hi @agordon 👋🏼

Your intuition is sort of there, but the choice of hospital preferences will make a difference. The optimality of a solution is dependent on the entire game, including the other side's preferences.

If you run the code below, you'll see that after only a few iterations more than one stable matching is found:

import itertools
import urllib
import yaml

from matching.games import HospitalResident

def read_yaml_to_dict(where, filename):
    """Read in the YAML data from the URL."""

    url = "/".join((where, filename))
    with urllib.request.urlopen(url) as response:
        dictionary = yaml.safe_load(response.read())

    return dictionary

def matching_from_permutation(
    resident_preferences, hospital_capacities, permutation
):
    """Run a game using `permutation` as all hospital preferences."""

    hospital_preferences = {
        hospital: permutation for hospital in hospital_capacities
    }
    game = HospitalResident.create_from_dictionaries(
        resident_preferences,
        hospital_preferences,
        hospital_capacities,
        clean=True,
    )

    return game.solve(optimal="resident")

base_url = "https://zenodo.org/record/3688091/files"
resident_preferences = read_yaml_to_dict(base_url, "residents.yml")
hospital_capacities = read_yaml_to_dict(base_url, "capacities.yml")

matching_hashes = set()
residents = list(resident_preferences.keys())
permutations = enumerate(itertools.permutations(residents))
while len(set_of_matchings) <= 1:

    i, permutation = next(permutations)
    matching = matching_from_permutation(
        resident_preferences, hospital_capacities, permutation
    )

    set_of_matchings.add(
        tuple(
            {
                h.name: tuple(r.name for r in matches)
                for h, matches in matching.items()
            }.items()
        )
    )
    print(i)

Now, that might not matter much for you since you still have the best possible stable matching for your students given the arbitrary preferences for courses. However, it is not right to say that their choices have been negated by making them all the same.

The approach you are suggesting is often used in instances of the student-project allocation problem, where all students are ranked the same among supervisors. I've implemented this for a bioscience department before where they had a ranking for the student cohort based on their average grade and engagement with the course. Perhaps you could derive a similar ranking so that the choice is not arbitrary (but not without consequence.)


The other option is to not use matching for this problem. What you are describing is a one-sided matching between unequal sets of agents. Your problem could be reframed as an instance of the house allocation problem, for which the top trading cycle algorithm produces stable matchings. We do not currently implement this algorithm - matchingR is a great R package that does though.

You would almost certainly have to expand student preferences according to the course capacity and use duplicates for courses to handle capacities >1, i.e.:

>>> prefs = {"a": [0, 1, 2], "b": [0, 1], "c": [1, 2]}
>>> capacities = {0: 1, 1: 2, 2: 1}
>>> preferences = {
...     s: list(itertools.chain(*[[f"{c}_{i}" for i in range(capacities[c])] for c in cs]))
...     for s, cs in prefs.items()
... }
>>> preferences
{'a': ['0_0', '1_0', '1_1', '2_0'],
 'b': ['0_0', '1_0', '1_1'],
 'c': ['1_0', '1_1', '2_0']}

There is also the capacitated house allocation problem, but I'm not sure if a deferred acceptance algorithm would produce stable matchings in that formulation. A quick scan online shows a lot of heuristic approaches...


I hope this helps you. I'm going to close this issue now. Good luck!