PugetSoundClinic-PIT / ProjectTracking

Apache License 2.0
0 stars 0 forks source link

Create datasets for EAGER survey (CISE Pilot) #105

Closed nniiicc closed 1 year ago

nniiicc commented 1 year ago

Directorate: CISE

For each area create two datasets

  1. OAC_NoSoftware - where we predict that software was not created
  2. OAC_Software - where we predict that software was created

The datasets should include all of the metadata we can retrieve (e.g. PI NAME, INSTITUTION, AWARD AMOUNT, etc)

evamaxfield commented 1 year ago

CISE-software.csv: https://drive.google.com/file/d/1AICf9hlMZR_foerr5wdkuDvxLIk4pPvn/view?usp=share_link

CISE-no_software.csv: https://drive.google.com/file/d/1OsioDq6DFzr0AeI95nbjcQ-ZkwAuy2C5/view?usp=share_link

Each row has all of the metadata available from the NSF award search API: https://www.research.gov/common/webapi/awardapisearch-v1.htm

AND the prediction_from_abstract and prediction_from_outcomes columns -- if the prediction_from_outcomes column has a None / empty value it is because that award doesn't have a project outcomes report yet.

Note: I do not know how to get your specific OAC, CCF, CNS, and IIS subsets. I assume it is likely either the dunsNumber or parentDunsNumber. But, I don't know how to convert your specific targeted offices to those numbers. If you know what it is I can filter it down even more.

The larger dataset (https://github.com/si2-urssi/eager#the-soft-search-inferred-dataset) has all of the awards too.