NeighborhoodInfoDC / RegHsg

Regional Housing Framework
1 stars 1 forks source link

Create tables of subsidized unit counts #39

Closed ptatian closed 5 years ago

ptatian commented 5 years ago

Create tables to help determine preservation targets. Tables should show for COG region

ptatian commented 5 years ago

Hi @Woliver01. I've created an issue for the next part of the regional housing framework analysis. I think it's best if we take this in steps.

Step 1. Create property and unit counts for individual programs.

The RegHsg.Natlpres_activeandinc_prop data set has a number of variables with counts of assisted units. We need to create some new variables that have the counts for only subsidies that are active.

A number of programs have multiple possible counts of units. We need to combine these into single set of counts. So, create a temporary data set that has these new variables in it.

s8_all_assistedunits = sum( s8_1_assistedunits, s8_2_assistedunits );
s202_all_assistedunits = sum( s202_1_assistedunits, s202_2_assistedunits );
...

Repeat as necessary for all programs with multiple vars like the above (which I think is all of the programs).

In addition, we need variables that count projects that have active subsidy units. So, for every program in the file, create a dummy variable that =1 if the number of all assisted units is > 0, and = 0 otherwise.

Please create a program that does the above and commit to the new branch. Then make a pull request so that I can take a look. (We still have more to do but we can add to the pull request as we go along.)

Thanks! Let me know if you have any questions.

ptatian commented 5 years ago

Hi @Woliver01. How are we doing on this task? Please let me know if you have any questions.

Woliver01 commented 5 years ago

Hi @ptatian I reserved some time today to wrap up what I've done and submit the program by the end of the day.

ptatian commented 5 years ago

Thanks, @Woliver01. I will have the next step for you when this first one is completed.

ptatian commented 5 years ago

@Woliver01 Here's the next step, for when you've made the changes I noted in the pull request.

Step 2. Create categories for subsidy combinations in a property

Add a new variable, ProgCat, to the temporary data set you created in step 1 that categorizes the projects based on subsidy combinations. ProgCat will have values from 1 to 10 based on the subsidies present.

ProgCat=

  1. Public housing
  2. Section 8 only
  3. Section 8 and HUD mortgage (FHA or S236) only
  4. Section 8 and other subsidy combinations
  5. LIHTC only
  6. LIHTC and other subsidies
  7. HOME only
  8. RHS only
  9. S202/811 only
  10. All other subsidy combinations

There are a number of ways you can code this. I would suggest using the 0/1 dummy variables you created earlier along with if/then/else statements.

if ph_all_activeunits then ProgCat = 1;

else if s8_all_activeunits and not( fha_all_activeunits or home_all_activeunits or 
lihtc_all_activeunits or rhs515_all_activeunits or rhs538_all_activeunits or 
s202_all_activeunits or s236_all_activeunits ) 
then ProgCat = 2;

else if s8_all_activeunits and ( fha_all_activeunits or s236_all_activeunits ) and 
not( home_all_activeunits or lihtc_all_activeunits or rhs515_all_activeunits or 
rhs538_all_activeunits or s202_all_activeunits ) 
then ProgCat = 3;

else if s8_all_activeunits then ProgCat = 4;

else if lihtc_all_activeunits and not( fha_all_activeunits or home_all_activeunits or 
rhs515_all_activeunits or rhs538_all_activeunits or s202_all_activeunits or 
s236_all_activeunits ) 
then ProgCat = 5;

etc.

A few things about the above. First, notice that you don't have to write ph_all_activeunits=1 or lihtc_all_activeunits=0. SAS treats the value 0 or missing as being false and any non-zero number as true in logical expressions.

Second, note that for the "only" categories, we have to systematically rule out the presence of all the other subsidies. That's what the not( ... or ... or ... ) expression does.

Third, including else in the else if statements is important because SAS will only evaluate those conditions if all of the previous ones were not true. That's why you don't need to keep checking for public housing (ph_all_activeunits) because you've already gotten all of the public housing units in the first statement.

Third, I omitted the state subsidies from the above, because it appears none of the projects have those subsidies recorded.

Doing these kinds of statements can be tricky, because you've got to make sure you have all of the possibilities covered and that you didn't put something in the wrong place. So, be sure to look at the data set created by the code to convince yourself that projects are being put in the right categories. For instance, look at few projects in each ProgCat category and make sure they seem like they are ending up in the right place.

In addition, please add to the program a Proc Format to create a format that labels the above codes for ProgCat and assign that format to the ProgCat variable.

Thanks, Will. If the above doesn't make sense, please ask me and I can try to explain it better!

Woliver01 commented 5 years ago

@ptatian thanks, this seems pretty straightforward but I'll definitely reach out if anything is unclear.

Woliver01 commented 5 years ago

Log path: L:\Libraries\RegHsg\Prog\Subsidized_unit_counts.log

ptatian commented 5 years ago

@Woliver01 We aren't done with this issue yet, so no need to include the LOG path. We will do that at the end when we are ready to close.

To recap, you should now be working on Step 2, which I posted in the comments here on Friday.

Woliver01 commented 5 years ago

@ptatian Step 2 is nearly done, I'll commit the program when I'm done with meetings.

Woliver01 commented 5 years ago

@ptatian For the sake of time, I can start writing the program for Step 3 if you have an idea of how you want the categorization of the expiration dates to look. I can write it in a separate program and then drop it in once we resolve Step 2.

ptatian commented 5 years ago

Hi @Woliver01 I can look at this now.

ptatian commented 5 years ago

@Woliver01 Good that you checked things and identified that there was a problem. The cause is that the variable names you are using here don't match the names of the 0/1 variables you created in the first data step. One tip off is that you have several uninitialized variable messages in the LOG.

NOTE: Variable home_all_activeunits is uninitialized.
NOTE: Variable fha_all_activeunits is uninitialized.
NOTE: Variable lihtc_all_activeunits is uninitialized.
NOTE: Variable rhs515_all_activeunits is uninitialized.
NOTE: Variable rhs538_all_activeunits is uninitialized.
NOTE: Variable s202_all_activeunits is uninitialized.
NOTE: Variable s236_all_activeunits is uninitialized.
NOTE: Variable s8_all_activeunits is uninitialized.
NOTE: Variable rhs538 is uninitialized.

Uninitialized variable messages don't appear as errors or warnings but they usually indicate a problem. In this case, you used home_all_activeunits, fha_all_activeunits, etc. but the variables you created are actually home_activeunits, fha_activeunits, etc. So, simply rename one set of variables or the other (it doesn't really matter which) and that will fix the problem.

ptatian commented 5 years ago

@Woliver01 Once you have made the corrections above, you can go on to step 3.

Step 3. Create project combined assisted unit counts and expiration dates

This is based on code that I have used previously for the DC Preservation Catalog. For projects that have multiple subsidies, we will create a min/max range. The minimum assumes the largest possible overlap among subsidized units, while the maximum assumes as little overlap as possible. In other words, the bottom of the range is whatever subsidy has the most assisted units, while the top of the range is determined by adding up all of the assisted units until you reach the total units in the project.

min_assistedunits = max( s8_all_assistedunits, s202_all_assistedunits, ... );
max_assistedunits = min( sum( s8_all_assistedunits, s202_all_assistedunits, ... ), TotalUnits );

(Fill in the remaining variable names where I have ... above.)

To display this in a table, we are going to take the midpoint of each of those ranges, and then calculate the "margin of error" for that estimate, which is just the difference between the maximum and the midpoint.

mid_assistedunits = min( round( mean( min_assistedunits, max_assistedunits ), 1 ), max_assistedunits );

if mid_assistedunits ~= max_assistedunits then moe_assistedunits = max_assistedunits - mid_assistedunits;

label
  min_assistedunits = 'Minimum possible assisted units in project'
  max_assistedunits = 'Maximum possible assisted units in project'
  mid_assistedunits = 'Midpoint of project assisted unit estimate in project'
  moe_assistedunits = 'Margin of error for assisted unit estimate in project';

I use the if statement above because I want the margin of error to be missing if the minimum and maximum are the same.

In addition, we want to create two variables that have the earliest and latest subsidy expiration dates, respectively, across all subsidies for each project. I will let you see if you can figure out how to do that. But, if it takes you more than 20 minutes to figure it out, then please ask for help.

ptatian commented 5 years ago

@Woliver01 Sorry for the delay in getting back to you. The last step is to create summary tables on numbers of units that need to be preserved. Since it is easier to do this with Proc Tabulate, and I know you are not yet familiar with it, I've written some code to do that. First, we need a format for grouping the years.

*** Create summary tables ***;

proc format;
  value yearrng
    2015-2020 = '2015 - 2020'
    2021-2025 = '2021 - 2025'
    2026-2030 = '2026 - 2030'
    2031-2035 = '2031 - 2035';
run;

The first table will look at the number of projects and assisted units by assistance categories.

options missing=' ';

ods csvall  body="&_dcdata_default_path\RegHsg\Prog\Subsidized_unit_counts_unique.csv";

title3 "Project and assisted unit unique counts";

proc tabulate data=Work.SubsidyExpirationDates  format=comma10. noseps missing;
  class ProgCat / preloadfmt order=data;
  var mid_assistedunits moe_assistedunits;
  table
    /** Rows **/
    all='Total' ProgCat=' ',
    /** Columns **/
    n='Projects'
    sum='Assisted Units' * ( mid_assistedunits='Est.'  moe_assistedunits='+/-' )
    ;
  format ProgCat ProgCat.;
run;

ods csvall close;

A few things on the above code, so that you can start getting familiar with Proc Tabulate. options missing=' ' suppresses the . for missing values in the output and replaces it with a blank space. The ods csvall statement directs the output to a CSV file so that it can be imported into Excel. The title3 statement puts a title on the 3rd line of the output.

In Proc Tabulate itself, the class statement lists variables that are used to define the table's rows and columns and the var statement has the analysis variables. The table statement is where we lay out the table. The row and column specifications are separated by a comma (,), within which we list the variables that make up each dimension. The all keyword creates a total. n and sum are statistic keywords. The operator is used to cross different class variables (like creating crosstabs) or to link analysis variables with a statistic. In the above, `sum='Assisted Units' ( mid_assistedunits='Est.' moe_assistedunits='+/-' )means to take the sum of *mid_assistedunits* and *moe_assistedunits* and display them in separate columns.='Assisted Units',='Est.'`, etc. are labels for those table entries.

Next, we want to look at units by expiration date within the 2015-2035 window.

ods csvall  body="&_dcdata_default_path\RegHsg\Prog\Subsidized_unit_counts_expire.csv";

title3 "Projects and assisted units with subsidies  expiring 2015 - 2035";
footnote1 "LIHTC expiration includes 15-year compliance and 30-year subsidy end dates.";

proc tabulate data=Work.SubsidyExpirationDates format=comma10. noseps missing;
  where not missing(earliest_expirationdate15);
  class ProgCat / preloadfmt order=data;
  class earliest_expirationdate15;
  var mid_assistedunits moe_assistedunits;
  table
    /** Rows **/
    all='Total' ProgCat=' ',
    /** Columns **/
    n='Projects'    
    sum='Assisted Units By Subsidy Expiration Year' * (  all='Total' earliest_expirationdate15=' ' ) 
      * (  mid_assistedunits='Est.' moe_assistedunits='+/-' )
    ;
  format ProgCat ProgCat. earliest_expirationdate15 yearrng.;
run;

ods csvall close;

Lastly, we want to produce separate tables for each subsidy individually. For section 8,

ods csvall body="&_dcdata_default_path\RegHsg\Prog\Subsidized_unit_counts_s8.csv";

title3 "Section 8 projects and assisted units with subsidies expiring 2015 - 2035";
footnote1;

proc tabulate data=Work.SubsidyExpirationDates format=comma10. noseps missing;
  where not missing(s8_endyr);
  class s8_endyr;
  var s8_all_assistedunits;
  table 
    /** Rows **/
    all='Total' s8_endyr=' ',
    /** Columns **/
    n='Projects'    
    sum='Assisted Units' * s8_all_assistedunits=' ' 
    ;
  format s8_endyr yearrng.;
run;

ods csvall close;

Please repeat the above to create tables for the other subsidies that have expiration dates.

The syntax for Proc Tabulate can take some time to master but it's worth the effort. I found it best to start with simple tables and then add to them, rather than trying to create a complicated table off the bat. You can also look at the table specifications above, compare to the output, and try to see how they relate.

Thanks, Will.

Woliver01 commented 5 years ago

@ptatian I've updated and about to commit the program. Do I need to create a table for the LIHTC15yr variable?

ptatian commented 5 years ago

Hi @Woliver01. Yes, please do separate tables for both the LIHTC 15yr and endyr variables. Please label them appropriately in the title. Thanks.

Woliver01 commented 5 years ago

Final SAS log: L:\Libraries\RegHsg\Prog\Subsidized_unit_counts.log