douglasrizzo / catsim

Computerized Adaptive Testing Simulator
GNU Lesser General Public License v3.0
124 stars 35 forks source link

Updated index which causes out of bounds error #15

Closed Randy-Ram closed 5 years ago

Randy-Ram commented 5 years ago

Hey Douglas,

I went back and checked on this today and noticed this piece of the code was throwing an error in Issue #14

[index for index in valid_indexes if items[index, 4] < self._r_max]

The code is referencing index 4 in a 4 column item_bank ndarray. When I changed the above to the below, it works as intended.

[index for index in valid_indexes if items[index, 3] < self._r_max]

Note, I just looked at the MaxInfoSelector class alone, I'm assuming this '4' index is used elsewhere in the file as well.

coveralls commented 5 years ago

Coverage Status

Coverage decreased (-0.3%) to 84.815% when pulling 03f20700f0af479cd0fd9d067f733ae3b91b9d32 on HyperManTT:maxinfo_selector_fix into 2b5f2b95cee68171d22e6056bf4add14ec6db435 on douglasrizzo:master.

douglasrizzo commented 5 years ago

Hey @HyperManTT. If I remember correctly, the items matrix stores items discrimination, difficulty, guessing and upper asymptote (parameters a, b, c and d of the 4PL IRT model) in columns 0, 1, 2 and 3, respectively.

Column 4 is used to store items exposure rates. It should be created by catsim at some point at the start of a simulation, with all values equal to 0. In this case it seems this column is not being created.

The line you mentioned in the PR and issue #14 is trying to filter out items with high exposure rates. For example, in order to avoid too many examinees from viewing the same item, one can limit the percentage of tests an item appears by setting the r_max parameter.

In any case... If I remember correctly, column with index 3 should contains the items' upper asymptote parameter and column 4 (which is absent) would be the right one to mention in that line. We just have to find out where column 4 is...

douglasrizzo commented 5 years ago

I could be wrong, however, and index 3 is actually what we need, but I'll have to get back to you. I am a little busy right now, unfortunately.

Randy-Ram commented 5 years ago

No problem @douglasrizzo , I checked around and even printed out the results from generate_item_bank and only saw up to column 3. Check it out if you get time, I'll play around with it in the meantime.

douglasrizzo commented 5 years ago

I have started implementing a possible fix. It will be up in the next few days.

Randy-Ram commented 5 years ago

Oh great, thanks Douglas. You'd know better than me but I had identified the MaxInfoSelector and the ClusterSelector as needing the 4th column and inserted it, the same way the simulator does it.

douglasrizzo commented 5 years ago

Yes, if you pass a fifth column with values between 0 and 1, representing the items exposure rates, the two selectors should behave as expected.

douglasrizzo commented 5 years ago

I've made the two selectors issue a warning, stating that the necessary column is missing. They treat the absent column as if items were never used before (exposure rates = 0). If the column is present, they'll take the exposure rates in consideration.