NeighborhoodInfoDC / RegHsg

Regional Housing Framework
1 stars 1 forks source link

Updated Subsidized unit count Step 2 #52

Closed Woliver01 closed 5 years ago

Woliver01 commented 5 years ago

@ptatian the viewtable doesn't look quite right, each of the properties have the same exact values. Not sure if this is something that gets fixed in the next step or not but @maggiesu0725 also took a look and wasn't sure why the table was showing up the way that it is. Please review.

Woliver01 commented 5 years ago

@ptatian I fixed the variable names, and added in a COG region format. The variables still each have the same values (they all say Active Subsidies) and they each are coded as "1". I'm not sure if I need to add another step in order to give each property its own values or if they're supposed to look like that. Please review.

ptatian commented 5 years ago

@Woliver01 Thanks. I will take a look.

ptatian commented 5 years ago

@Woliver01 It's possible to see what is going on if you look at a few observations.


                               PH_1_           PH_2_
                            Assisted        Assisted       PH_all_
 Obs      TotalUnits           Units           Units    assistedunits     PH_activeunits

   1              18               .               .          18         Active subsidies
   2             195             195               .         195         Active subsidies
   3              83               .               .          83         Active subsidies
   4              17               .               .          17         Active subsidies
   5              23               .               .          23         Active subsidies

I had to temporarily comment out the drop statement in the data step. Then I used a Proc Print to generate the output above.

proc print data=allassistedunits (obs=10);
  var totalunits ph_1_assistedunits ph_2_assistedunits ph_all_assistedunits ph_activeunits;
run;

As you can see, every project is getting assigned a positive number of PH units. This is an issue of the missing values in the PH_1_AssistedUnits and PH_2_AssistedUnits variables. When using min(), max(), sum(), etc. functions, SAS ignores any missing values. As a result, for obs #1, PH_all_assistedunits=min(sum(PH_1_AssistedUnits, PH_2_AssistedUnits),TotalUnits); ends up assigning TotalUnits as the value for PH_all_assistedunits because the PH vars are both missing.

The simple fix is to insert a 0 in the sum function, which forces it to evaluate to 0 if the other values are missing.

PH_all_assistedunits=min(sum(PH_1_AssistedUnits, PH_2_AssistedUnits, 0),TotalUnits);

Try that and see if it solves the problem. Thanks!

Woliver01 commented 5 years ago

@ptatian that fixed it, thanks! I've started on Step 3, I'll commit it and tag you once I'm done.

Woliver01 commented 5 years ago

@ptatian I tried out a couple methods but this one worked I think. Please review.

ptatian commented 5 years ago

Thanks, @Woliver01. You got it right! Also glad you added the formats to the date variables. However, you don't need to add informats in this case. Those are only needed when we are importing data from a text file into the SAS data set, which we aren't doing here.

The previous problem we had with ProgCat was because we didn't look closely enough at the data. I've added some Proc Means and Proc Freq's that will allow us to check things as we go along. (I also put some blank lines between the statements in the data step because the code was hard to read.) I've committed those changes and you should sync them with your branch before continuing.

I'd like to make a couple of adjustments to the ProgCat var. Looking at the expiration dates revealed another thing to fix because the public housing ended up with expiration dates, which it normally should not. That is because there are some other subsidies mixed in with some of the public housing. So, we need to add a new category to ProgCat (=2) for "Public housing and other subsidies." The first category should be relabeled as "Public housing only" and the other categories should renumbered to move them each down one slot.

In addition, I'd like add a new category for "HUD insured mortgage only." These are projects that have FHA or S236 only and nothing else. This category can come at the end right before the "All other subsidy combinations."

Would you please make those changes? Thanks, Will. Let me know if you have any questions.

Woliver01 commented 5 years ago

@ptatian Thanks, Peter. Makes sense, I'll work on this now.

Woliver01 commented 5 years ago

@ptatian I committed the changes. Please review.

Woliver01 commented 5 years ago

@ptatian looking through the log again, should I move the proc freq step to after the data step?

ptatian commented 5 years ago

@Woliver01 No, that was my fault. I put the wrong data set name. I fixed it.

ptatian commented 5 years ago

@Woliver01 We are getting very close but we need to adjust how we are doing the expiration dates. For this analysis, we are only concerned about expiration dates between 2015 and 2035. Also, for LIHTC, its more complicated because we have to look at 15 yr and 30 yr dates, which are both key preservation risk points. Finally, we need a combined expiration for each subsidy type.

First, somewhere near the top of the program (such as after the %DCData_lib() line), please add the following lines.

** Year range for preservation targets **;
%let Startyr = 2015;
%let Endyr = 2035;

These are macro variables that we will use to define the date ranges. I'm doing it this way so that, if we need to change the years later, it will be easy to do since it is just in one place.

Then please modify the Work.SubsidyExpirationDates data step as follows.

You can keep all of the current code, but we want to add the LIHTC 15-year dates. The default end date is 30 years, so we can get to 15 by moving the date back using the intnx() function.

  if LIHTC_1_enddate > 0 then LIHTC_1_15date = intnx( 'year', LIHTC_1_enddate, -15, 'same' );
  if LIHTC_2_enddate > 0 then LIHTC_2_15date = intnx( 'year', LIHTC_2_enddate, -15, 'same' );

Next, we need to add calculations for combined ending years for every subsidy.

  if &Startyr <= year( S8_1_enddate ) <= &Endyr then s8_endyr = year( S8_1_enddate );
  if &Startyr <= year( S8_2_enddate ) <= &Endyr then s8_endyr = min( year( S8_2_enddate ), s8_endyr );

Repeat for all other subsidies. What we end up with is the earliest expiration year that is between 2015 and 2035. Years inside and outside that range will be ignored.

When it comes to LIHTC, however, you need to use both the enddate and the 15yr variables.

  if &Startyr <= year( LIHTC_1_enddate ) <= &Endyr then LIHTC_endyr = year( LIHTC_1_enddate );
  if &Startyr <= year( LIHTC_2_enddate ) <= &Endyr then LIHTC_endyr = min( year( LIHTC_2_enddate ), LIHTC_endyr );

  if &Startyr <= year( LIHTC_1_15date ) <= &Endyr then LIHTC_15yr = year( LIHTC_1_15date );
  if &Startyr <= year( LIHTC_2_15date ) <= &Endyr then LIHTC_15yr = min( year( LIHTC_2_15date ), LIHTC_15yryr );

Finally, calculate an earliest expiration year and a latest expiration year, like you did with the dates, using the _endyr and _15yr (for LIHTC) variables you created above.

Let me know if you have any questions.

Woliver01 commented 5 years ago

@ptatian I'm doing it now, for the last part do you mean just using the new variables or using the new variables and the original s8_1_enddate variables? Also should I use this new calculation instead of the previous one without the 15yr?

ptatian commented 5 years ago

@Woliver01

Finally, calculate an earliest expiration year and a latest expiration year, like you did with the dates, using the _endyr and _15yr (for LIHTC) variables you created above.

I meant that these are two new variables (min and max) based just on the variables you just created that have the years, not the dates. I want both the subsidy end years and the LIHTC "15-year" end years included in the calculation of the new variables.

Please let me know if that's still not clear.

Woliver01 commented 5 years ago

@ptatian Ok, just committed the program. Please review.

ptatian commented 5 years ago

Thanks, @Woliver01. This looks great. We are very close. I have to leave now but I can pick this back up with you in the morning.

Woliver01 commented 5 years ago

@ptatian Just added tables. Please review.

ptatian commented 5 years ago

@Woliver01 This all looks good! Unfortunately, there's some kind of conflict that needs to resolved before we can merge but I cannot see what it is. I'll have to take a look at it in the morning. Thanks!

ptatian commented 5 years ago

Thanks, @Woliver01. I was able to resolve the issue with the merge conflict. Sent a separate email to Leah to get her feedback on what we have. Depending on what she says, we may need to do more work or we may be done. We'll have to see.

Woliver01 commented 5 years ago

@ptatian thanks for letting me know. I'll keep an eye out for updates.

ptatian commented 5 years ago

Hi @Woliver01. Sorry for leaving you hanging on this. I checked with @lhendey, who asked if we can do an additional breakdown of public housing by the age of the building. Is that information in the national database?

Woliver01 commented 5 years ago

Hi @ptatian the closest variables NHPD has to this are the earliest and latest construction date for an individual building at a property. Is this what you mean?

ptatian commented 5 years ago

@Woliver01 Yes, that would be good information to use. I would use the latest construction date. Can you please look at adding another table to what we have that would focus on public housing properties and units by the year built?

Woliver01 commented 5 years ago

@ptatian sure, what is the timeline for this? I'm going to be pretty booked the next couple of days but I could get it to you by the end of the week.

ptatian commented 5 years ago

End of the week would be great. Let me know if you have any questions once you get into it. Thanks, @Woliver01.

Woliver01 commented 5 years ago

Hi @ptatian working on this now. Would the code look similar to the code for the SubsidyExpirationDates dataset?

ptatian commented 5 years ago

@Woliver01 Yes, that is a good place to start.

Woliver01 commented 5 years ago

@ptatian So I've made the dataset, I believe, with a new variable for PH Construction Dates. When you say table, do you mean to use proc tabulate to create a csv? And if so, what variables should be there and how should they be configured?

ptatian commented 5 years ago

Hi @Woliver01. Yes, we should use Proc Tabulate. I know that you aren't terribly familiar with Proc Tabulate you can copy and modify code for a table we've already done.

ods csvall  body="&_dcdata_default_path\RegHsg\Prog\Subsidized_unit_counts_expire.csv";

title3 "Projects and assisted units with subsidies  expiring 2015 - 2035";
footnote1 "LIHTC expiration includes 15-year compliance and 30-year subsidy end dates.";

proc tabulate data=Work.SubsidyExpirationDates format=comma10. noseps missing;
  where not missing(earliest_expirationdate15);
  class ProgCat / preloadfmt order=data;
  class earliest_expirationdate15;
  var mid_assistedunits moe_assistedunits;
  table
    /** Rows **/
    all='Total' ProgCat=' ',
    /** Columns **/
    n='Projects'    
    sum='Assisted Units By Subsidy Expiration Year' * (  all='Total' earliest_expirationdate15=' ' ) 
      * (  mid_assistedunits='Est.' moe_assistedunits='+/-' )
    ;
  format ProgCat ProgCat. earliest_expirationdate15 yearrng.;
run;

ods csvall close;

You should update the title and you can delete the footnote, since it doesn't apply to the public housing data.

In the Proc Tabulate, the two class variables define the table rows and columns. What is different for this table from the one we want to create? First, instead of defining columns using earliest_expirationdate15, we want to use the variable you are creating for when public housing was built. So, replace earliest_expirationdate15 with the new variable throughout. And edit the label after sum= so that it describes the new variable.

Second, you probably want to create a different format to apply to the public housing year built variable than yearrng.. You should create a new format and put it on the format statement.

Third, we only want to include public housing projects in this table, so you should change the where statement so that it selects only public housing observations.

Finally, you want to give the output file a different name in the ods csvall statement.

I hope that makes sense. Give it a try and see if it works. Let me know if you have other questions.

Woliver01 commented 5 years ago

@ptatian I'm realizing now that you didn't specify how you wanted the new variable to be formatted so I just did before 2000 and after, for the construction dates. I can fix these in the code so let me know.

ptatian commented 5 years ago

@Woliver01 How are the dates distributed by year? Let's come up with a breakdown of 4-5 year ranges that are evenly spread out.

Woliver01 commented 5 years ago

@ptatian most of them fall between 1955 and 1980. I could do 1930-1950, 1951-1970, 1971-1990, 1991-present, so basically 20yr splits.

ptatian commented 5 years ago

@Woliver01 Sounds good! 👌

Woliver01 commented 5 years ago

@ptatian I just pushed the 20 year splits. Please review.

ptatian commented 5 years ago

Thank you, @Woliver01. I'll look at this now.

Woliver01 commented 5 years ago

@ptatian Great, I just made those changes.

ptatian commented 5 years ago

Sorry, @Woliver01. I meant delete the total only from the row spec. We still want to have the total column.

sum='Public housing latest construction dates' * (  all='Total' timecount=' ' )

Looking at this again now, is there a reason you only created the construction date when ProgCat=1? ProcCat=2 also has public housing. Can we include those projects as well?


I'm realizing now I should have asked for a breakdown by jurisdiction, so can we please add a table for that? It can be the 2nd table in the series and would be similar to the "Projects and assisted units with subsidies expiring 2015 - 2035" table but substitute the jurisdiction for the expiration date in the table.

You'll have to create a jurisdiction variable for this. We have a macro, %ucounty_jurisdiction, that will create that variable from the county, but the macro is coded to use a variable called ucounty so you'll have to create that variable first. You can insert these lines after the lines where you create the COGregion var.

** Add jurisdiction var **;
ucounty = CountyCode;
%ucounty_jurisdiction;

Then, you will need to add a format specification for this new variable to the formats you are creating.

    value Jurisdiction
        1= "DC"
        2= "Charles County"
        3= "Frederick County "
        4="Montgomery County"
        5="Prince Georges "
        6="Arlington"
        7="Fairfax, Fairfax city and Falls Church"
        8="Loudoun"
        9="Prince William, Manassas and Manassas Park"
        10="Alexandria"
        ;

Finally, apply the jurisdiction. format to the jurisdiction variable in the Proc Tabulate for the new table.

Note that I pulled in the macro from another branch so make sure you pull all changes from GitHub before you run the program.

Thanks, Will. Please let me know if you have any questions.

Woliver01 commented 5 years ago

@ptatian Thanks, Peter. I'll address these changes first thing tomorrow morning.

lhendey commented 5 years ago

@Woliver01 - any luck with these? @ptatian

Woliver01 commented 5 years ago

@ptatian I'm having some trouble getting the macro to run. I keep getting this warning: WARNING: Apparent invocation of macro UCOUNTY_JURISDICTION not resolved.

I'm meant to place the code here correct?: data Work.Allassistedunits; set RegHsg.Natlpres_activeandinc_prop; if CountyCode in ("11001", "24017", "24021", "24031", "24033", "51013", "51059", "51107", "51153", "51510", "51600", "51610", "51683", "51685") then COGregion =1; else COGregion=0; format COGregion COG. ; Add jurisdiction var ; ucounty = CountyCode; %ucounty_jurisdiction;

Woliver01 commented 5 years ago

@lhendey I've added the public housing construction dates and their splits. Now we're adding the jurisdiction breakdown, which should be the last step. @ptatian

ptatian commented 5 years ago

@Woliver01 Did you pull the latest commits in GitHub? The %ucounty_jurisdiction macro won't exist until you do. Check that the D:\DCData\Libraries\RegHsg\Macros\ucounty_jurisdiction.sas file is present on your desktop.

If you are working in interactive SAS, you'll need to exit SAS completely and restart after you pull the macro definition.

Woliver01 commented 5 years ago

@ptatian I pulled the latest commit yesterday before I left, the macro program was/is there. I can try restarting the program again.

ptatian commented 5 years ago

@Woliver01 Yes, please try that and let me know if it doesn't work.

Woliver01 commented 5 years ago

@ptatian It's still not working, I tried making it a separate data step and that still did not work.

ptatian commented 5 years ago

@Woliver01 Can you commit and push what you've done?

ptatian commented 5 years ago

@Woliver01 Oh, wait. You have %DCData_lib( RegHsg, local=n ) at the start of the program. You should delete the , local=n part. That's the problem. Actually, you should almost never have , local=n specified in %DCData_lib(). Just put the library name and nothing else. (What that option does is limit SAS only to look on L:. But, the macro only exists on D: currently so that's why it can't find it.)

Once you do that, you will again need to exit the SAS session and restart. But, it should work now.

I'm going to be shutting down my computer, so if you do get a CSV by the end of the day, please email it to Leah and me. Thanks!

Woliver01 commented 5 years ago

@ptatian Thanks Peter, that worked. I'll email you and @lhendey the csv.

Please still review the code.

ptatian commented 5 years ago

Thank, @Woliver01. Changes look good. If you can just change the label here in the new jurisdiction table, that would be perfect.

sum='Assisted Units By Subsidy Expiration Year'
Woliver01 commented 5 years ago

@ptatian Thanks, I fixed it.

ptatian commented 5 years ago

Thanks, @Woliver01.