department-of-veterans-affairs / gibct-data-service

Allows stakeholders to build a valid GIBCT database from various data sources and apis.
Other
9 stars 5 forks source link

Rolled up complaints for UoP not matching up #77

Closed saintsoup52 closed 8 years ago

saintsoup52 commented 8 years ago

The total rolled up complaints for University of Phoenix is different between my spreadsheet and the digital service.

My database has 480 rolled up complaint, the digital service has 477 complaints.

I thought it may have to do with approved / non-approved schools but that doesn't seem to be the issue.

I put the complaints side by side and I can't figure out why.

mphprogrammer commented 8 years ago

Looking at the complaints file for UofP. The OPEIDs are only 7 characters long and are missing a leading "0" - when I make an ope6 from a opeid I right pad the id with 0s, I'll have to left pad them to add the missing 0 at the front. I am not sure yet if this is the cause, but it may explain it - and its a problem anyway.

mphprogrammer commented 8 years ago

Also - 10 of the closed and non-invalid complaints for UofP have no opeid.

mphprogrammer commented 8 years ago

Is that right?

mphprogrammer commented 8 years ago

I was using that column to maintain load-order independence so that the VA folks using the DS need not worry about the order in which they load CSVs. Using the crosswalk for this meaning I can no longer do that for complaints. I will have to move complaint logic around for this.

saintsoup52 commented 8 years ago

Sorry... this is a critical piece.

We have no way to update the OPE IDs in the complaint database so the longer we go the worse that list will get.

saintsoup52 commented 8 years ago

I just tested the updated data service... we are still getting different answers, but now I know why.

If the complaint does not have a facility code associated with it then it should be ignored.

Right now you are first looking for a facility code and if that is not present you are using the OPE code in the complaint file. While that is an ingenious solution, there is no way for us to keep those OPE IDs current. This is needs to be fixed in the complaint system, not the comparison tool.

I am going to point out to EDU that complaints should all be associated with a facility code because this is a larger problem overall (169 closed, valid complaints with no facility code).

Almost there... good work!!

mphprogrammer commented 8 years ago

Thanks Patrick - I'm nulling out the ope ids as I read them in so when a complaint is created in the complaints table, there will be no ope id when first saved.

Right before a new complaint is saved in the complaints table, I update the complaint categories (financial, quality, etc) for that complaint iff the status is closed and not invalid (method :ok_to_sum?). If the regex matches one of your category keywords, the corresponding "match" attribute is set to 1 (otherwise it remains as 0).

After all the complaints are loaded in, I sum the complaints by facility code (but only for those complaints with facility codes) and update of the facility code sums for the associated complaints. Lastly, when the data_csv is built (after CSV files have been uploaded) I then get and save the proper OPE id from the crosswalk using the complaint facility_code. Since null facility_codes have no crosswalk, no OPE id is saved for those complaints and hence should not be included in the ope roll ups. I hope this works ....

mphprogrammer commented 8 years ago

I am now getting 476 (I used the UoP online) with 237 complains for that facility_code. Also, the distribution of complaints in different categories are different. Could you send me the complaints file you use (if it isn't Comparison Tool-Complaint Data 20160330.csv)

mphprogrammer commented 8 years ago

I just did a hand count of the UoP online for the complaints file I have in excel - there are 237 complaints for UoP Online which matches what my local DS says, but not what the GIBCT in production says (239). I don't think I have the same complaints file that you do. I am going to do a roll up manually to see what I get.

mphprogrammer commented 8 years ago

By hand I've counted 474 for UoP rolled up by ope6. I have 476 and you have 480?

mphprogrammer commented 8 years ago

Including no responses in "reason" its now 475 rolled up for UoP ope6

mphprogrammer commented 8 years ago

Got it - I didn't check off UoP Philly in the filter. The DS now matches the complaints counts by fac code and ope6 I did by hand using the complaint csv that I have. Are our complaints csvs different?

saintsoup52 commented 8 years ago

The latest complaint file is uploaded to the data service and for UoP the rolled up complaint total should be 493.

Remember the filters when you check the numbers against the raw complaint file:

When I filter all schools with the name "University of Phoenix" and apply the filters above I get 493 just like I have in the latest db output.

As an fyi... the name for UoP Online changed in WEAMS so you can't just search for the name, you have to search for the facility code.

saintsoup52 commented 8 years ago

We are now off by 3 with the UoP roll up... this is just because you are using the OPE ID in the complaint file when there is no facility code, which we shouldn't be doing.

mphprogrammer commented 8 years ago

Patrick - This is what I'm doing:

  1. When I load the Complaint CSV I'm NULL-ing out all opes just before I save the complaint.
  2. When the complaint is saved, I check to see that the status is closed and the reason is not invalid. If that is the case I examine the issues for keywords like 'financial', 'quality', ... , setting an associated attribute to 1 if the keyword is present and 0 if not. If the complaint is not closed or invalid, I leave those attributes to their default value 0.
  3. After all complaints have been ingested and saved I call a method to sum the complaints by facility code. This is a snippet of that - it's a query that sums all complaints for non-null facility codes and puts the sums in complaints_financial_by_fac_code, complaints_quality_by_fac_code, and so on. It produces complaint sums by facility code for every institution having a facility code.

    results = Complaint.select(:facility_code).select(select_strings.join(", ")) .where.not(facility_code: nil).group(:facility_code)

  4. When the data_csv is built, each complaint is assigned an ope from the crosswalk based on its facility code. If there is no facility code, then there is no ope (they were all removed in step 1). Following that I rollup the complaint counts for non-null ope6s grouped by ope6.

I'm nulling out all opes, then setting the complaints in categories (financial, quality, ...) to 0s and 1s for those complaints that are not invalid and closed, then summing by facility code for non-null facility codes, assigning the ope from the crosswalk, and computing the ope sums for non-null opes.

Could you look this over and see if and where the logic flaws are?

mphprogrammer commented 8 years ago

Now this is getting interesting - our sums differ by campus (facility code) as well: the UoP-online has 237 via the DS and 239 in production. I can't explain this - null facility_codes would not enter into this at all.

mphprogrammer commented 8 years ago

@saintsoup52 @rickleegit Sorry to drone on like this, but I think the issue isn't null fac codes, but we are getting different sums for the different categories for each campus:

screen shot 2016-05-27 at 11 50 25 am

saintsoup52 commented 8 years ago

I updated the complaint file in the DS.. someone reverted it back to an April file. maybe me... I hope not.

mphprogrammer commented 8 years ago

So are we good? I've done these computations by hand, and the nil facility_codes don't seem to be entering into this (c.f., step 3 above). I'm just so confused at this point LOL.

rickleegit commented 8 years ago

This certainly looks like a versioning issue. The numbers are too close to be programatic calculation errors, practically speaking.

On Fri, May 27, 2016 at 12:06 PM, Marc Harbatkin notifications@github.com wrote:

So are we good? I've done these computations by hand, and the nil facility_codes don't seem to be entering into this (c.f., step 3 above). I'm just so confused at this point LOL.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/department-of-veterans-affairs/gibct-data-service/issues/77#issuecomment-222186881, or mute the thread https://github.com/notifications/unsubscribe/AKn3Dgt__xP7gm0gn0j4WofcoUPR7NQ8ks5qFxZpgaJpZM4IkhLW .

rickleegit commented 8 years ago

This was a versioning issue. PEBKAC.