CredentialEngine / Schema-Development

Development of the vocabularies for the CTI models
14 stars 8 forks source link

Add properties for student debt alongside wages to https://credreg.net/qdata/terms/DataProfile and/or https://credreg.net/qdata/terms/DataSetProfile and / or https://credreg.net/ctdl/terms/AggregateDataProfile #879

Open jeannekitchens opened 1 year ago

jeannekitchens commented 1 year ago

This is an Equity Council use case to publish the debt of graduates analyzed alongside wages.

Here's some globally utilized student debt/aggregate outcome statistics:

  1. Average debt at graduation - the average amount of student loan debt held by students upon graduating from an institution or program. This is often calculated separately for undergraduate and graduate programs.
    1. Loan default rates - the percentage of borrowers who are unable to repay their student loans within a certain timeframe and enter default status.
    1. Loan repayment rates - the percentage of borrowers who have made progress in repaying their student loans, usually measured by the percentage of borrowers who have made payments on their loans for a certain number of months after entering repayment.
    1. Total student loan debt - the total amount of outstanding student loan debt held by borrowers in a certain country or region.

Sources Include:

  1. In the United States, the National Center for Education Statistics (NCES) and the Federal Student Aid office within the U.S. Department of Education collect and publish data on student debt and other higher education outcomes. https://nces.ed.gov/
  2. States/Provinces. See Texas definitions below.
  3. The Organization for Economic Cooperation and Development (OECD) collects data on student debt and other higher education outcomes for its member countries and publishes reports and analyses of the data. https://www.oecd.org/education/
  4. The United Nations Educational, Scientific and Cultural Organization (UNESCO) also collects and publishes data on higher education outcomes, including student debt, for countries around the world. https://data.uis.unesco.org/
  5. The World Bank also collects and publishes data on higher education outcomes and student debt for countries around the world. https://datatopics.worldbank.org/education/

United State College Scorecard Glossary of Outcome Data https://collegescorecard.ed.gov/data/glossary/

State of Texas Definitions for Sudent Loan/Debt Statistics:

Loan_Amount | Average loan amount at graduation. |   | Average loan amount is calculated by averaging each student’s loan debt, accumulated at all Texas institutions up to the time of receiving an applicable degree, based on the student’s highest degree earned. Only students with debt are included. Each student's loan debt includes all loans reported in the THECB financial aid database (FADS) report by any institution for that student in the last 15 years. Parent loans are excluded. Data are available for programs with at least 5 graduates with loans. Source: THECB CBM001, CBM009, Financial Aid Database System (FADS) -- | -- | -- | -- Loan_Percent | Percent of graduates with loans. |   | The percentage of graduates who accumulated student loan debt at the time of award as reported on the THECB FADS reports by any institution in the last 15 years. Parent loans are excluded. Source: THECB FADS Report Loan_to_First_Wage_Ratio | Loan as a percentage of first year wages. |   | The median of individual student loan debt as a percentage of first year wages. Individual must have student loan debt at time of award and wages in first year following award. Each student's loan debt includes all student loans reported in the THECB financial aid database (FADS) report by any institution for that student in the last 15 years. Parent loans are excluded. First year wages are based wage data reported to the Texas Workforce Commission (TWC). Source: THECB CBM009, FADS Report, TWC UI wage records
siuc-nate commented 1 year ago

I don't think we need specific properties for every niche of data that someone can publish unless they are common to enough distinct publishers to be worthwhile (otherwise we will end up merging everyone's bespoke schemas together and not have any interoperable data).

Instead, for cases like this, we should ensure there is a property that allows some degree of flexibility/customization in what it describes. It's hard to tell whether exactly something like that already exists in Qdata (perhaps qdata:subjectsInSet or qdata:subjectValue?), but if not, then something explicitly intended for "other" data like the above would be the way to go. Then there is at least some degree of interoperability (everyone using the same custom property) without blowing up the size of the schema. We can then monitor that for potential candidates for specific properties.

Something like this perhaps:

URI: qdata:dataPoint Label: Data Point Description: A piece of information for which there is no more specific property. Domain: qdata:DataProfile Range: schema:QuantitativeValue

Usage in the above example:

{
  "@type": "qdata:DataProfile",
  "qdata:dataPoint": [
    {
      "@type": "schema:QuantitativeValue",
      "schema:description": { "en": "Average loan amount at graduation." },
      "schema:value": 12345,
      "schema:unitText": [ "res:ce-for-USD" ]
    },
    {
      "@type": "schema:QuantitativeValue",
      "schema:description": { "en": "Percent of graduates with loans." },
      "schema:percentage": 0.5
    },
    {
      "@type": "schema:QuantitativeValue",
      "schema:description": { "en": "Loan as a percentage of first year wages." },
      "schema:percentage": 0.25
    }
  ]
}

Part of solving this may involve solving #875 since that will have implications for whether or not schema:currency needs to be part of schema:QuantitativeValue.

philbarker commented 1 year ago

I think debt is important, not a niche case. Especially globally.

It is kind of a consequence of fees Vs funding options, and the funding options depend on student circumstances in varied way. For example an English student of Computer Science at HW is going to graduate with a much higher debt than a Scots student, because the Scottish government pays the tuition fees directly to the university.

Is that the sort of thing that subjectType deals with?

BTW, there is no schema:percentage property.

siuc-nate commented 1 year ago

My point wasn't that debt in general is niche, it is that these specific ways of measuring it are ones we haven't seen from other publishers yet (as far as I am aware) and we don't want to go adding every institution's specific ways of measuring things into QData if nobody else uses those same measures. We should incorporate the ones that are used by multiple institutions in order to facilitate sharing. A generic property can handle things specific to a single institution (ie "niche").

If we see those specific measures from other institutions, then they may be candidates for new properties, but I think we should look around first to see what other institutions use for similar notions. We don't want to end up with 17 very subtly different debt-related properties when 2 or 3 would do (and facilitate better interoperability) in the long run.

philbarker commented 1 year ago

@siuc-nate OK, got you. Let's start with debt as generic and then think about how to describe whose debt is being measured, how and when. I think there are common ways of defining the how and when (though maybe that's my European view of how these things are done) I would look to something like OECD for those.

siuc-nate commented 1 year ago

Per our 7/10/2023 meeting: Example data: https://sandbox.credentialengine.org/finder/credential/50787 Example descriptions (currently done via qdata:holdersInSet):

philbarker commented 11 months ago

Text descriptions will do for a first pass, it'll give an indication to prospective students. Go for it.

It won't be sufficient for serious analytics, for that you need identifiers for common methodologies for measuring all the variables -- but that would probably need a task group to sort out, if we need to and feel able to.

BTW @siuc-nate the example data above shows schema:percentage": 0.5: that should be qdata:percentage": 50

siuc-nate commented 11 months ago

@jeannekitchens @philbarker Can you both confirm that the plan, then, is to move forward with the proposal from this post?

URI: qdata:dataPoint Label: Data Point Description: A piece of information for which there is no more specific property. Domain: qdata:DataProfile Range: schema:QuantitativeValue

philbarker commented 11 months ago

Confirmed.

jeannekitchens commented 11 months ago

There needs to be a property or concept for student debt or studen loan debt. With data set profile, users e.g., state governments can include a description that is based on the way they calculate student loan debt but publishers and consumers need to know that the date they're publishing or getting is about student loan debt. If the final proposal is only to add Data Point, I don't believe this addresses the use case.

siuc-nate commented 11 months ago

Do we have enough data from enough states to know how they measure it (see example above - measured as a loan in absolute dollars, measured as debt (ie minus any grants/payments/etc), measured as a percentage of wages, whether there is a time component involved (e.g. first year's wages, debt over 10 years, etc). There is a lot of potential variation and if we want data that speaks a common language that all states can share/compare among each other, we'll need to come up with what the common denominator should be.

There is also a similar question for how this is handled internationally.

jeannekitchens commented 11 months ago

There can be multiple types of calculations for student debt, the main point it to know the data is about student debt. They use the description to describe how they do the calculation with the data set profile and they also do this with the data set. We don't need to document the calculations, rather the type of data.

siuc-nate commented 11 months ago

Would this be something specific to student loans, or debt in general? Also, are there any issues with using the term "student" as opposed to "learner" or one of the other more generic terms like that? I guess what I'm asking is, how broadly/narrowly should we scope such a property?

jeannekitchens commented 11 months ago

It's only relevant to students accumulating debt to cover costs for learning and debt is via loans.

philbarker commented 11 months ago

This is where putting all data in an AggregateDataProfile fails. The clearest solution is to have a StudentDebtProfile class so that rdfs:type can do its job of telling data consumers what type of thing the data is about no matter what property points to it (e.g. it could be referenced from outside the registry using a non-CTDL property). Next best would be a dataSetType property pointing to concepts such as StudentDebtProfile.

siuc-nate commented 11 months ago

For context, Phil, we're looking at adding a property to DataProfile with a range of either QuantitativeValue or MonetaryAmount, so that would work the same way as all the other properties on DataProfile in QData. AggregateDataProfile isn't in play here and the contextual information is provided by QData classes higher up the hierarchy.

philbarker commented 11 months ago

@siuc-nate sorry, sometimes it's hard to keep track of these long-running issues, especially times like now when several are in play at the same time. I got mislead by the issue title :-}