EleutherAI / the-pile

MIT License
1.46k stars 126 forks source link

NIH Abstract text for awarded grants #34

Closed thoppe closed 3 years ago

thoppe commented 3 years ago

The NIH (National Institutes of Health) provide a record of all the abstracts of publicly funded grants on ExPorter. There are two main URLs:

https://exporter.nih.gov/ExPORTER_Catalog.aspx?sid=0&index=1 https://exporter.nih.gov/CRISP_Catalog.aspx?sid=0&index=1

The later of which contains some overlapping legacy data. The text needs some minimal preprocessing, but is otherwise in good shape. Example:

DESCRIPTION (provided by applicant): Promising results from prophylactic HPV vaccine trials support using these vaccines in cervical cancer prevention programs in the-future. Since vaccine coverage rarely if ever reaches 100%, population-level effectiveness of a prophylactic vaccine designed to prevent a sexually transmitted infection, such as an HPV vaccine, depends not only on the efficacy of the vaccine, but also on the incidence and duration of infection in both men and women. Although much has been learned about the epidemiology of human papillomavirus (HPV) infections in women, little is known about the incidence, determinants, and natural history of HPV infections in men. Research in men has been hampered, in part, by an inability to obtain adequate genital samples for HPV DNA testing. As discussed in this proposal, we developed a sensitive and acceptable method for sample collection and now propose to use this method in a prospective natural history study with the following aims. Among young men, (1) determine the incidence of infection with any type of HPV, oncogenic HPV, specific HPV types including HPV 16 and HPV6/11, and HPV 16 variants; (2) define risk 'factors for incident HPV infection, including lifetime and recent number of sex partners, circumcision status, condom use, frequency of vaginal intercourse, and courtship behavior; and (3) describe the natural history of HPV infection in men as measured by duration and levels of HPV DNA, HPV type-specific seroconversion, duration of antibodies, and development of genital warts. Our long-term goal is development of cost-effective approaches to the prevention of HPV-related cancers.

It's not the largest dataset (estimated about 2 GB compressed?) but it's easy to get and the text is high-quality.

thoppe commented 3 years ago

This is done, writing up now. Please add me as the assignee for it :)

thoppe commented 3 years ago

Code is complete, and is currently up at https://github.com/thoppe/The-Pile-NIH-ExPORTER . Working on importing it into the main repo now.

StellaAthena commented 3 years ago

This is done, writing up now. Please add me as the assignee for it :)

So you're able to close issues but not assign yourself to them? Can you change the label or where it is in the Kanban?

thoppe commented 3 years ago

It doesn't look like it. I can only close issues that I created if that helps. I can't do assignments, or move where they are (though I think I can move them with a merged PR?)

On Wed, Sep 9, 2020 at 8:51 PM Stella Biderman notifications@github.com wrote:

This is done, writing up now. Please add me as the assignee for it :)

So you're able to close issues but not assign yourself to them? Can you change the label or where it is in the Kanban?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/EleutherAI/The-Pile/issues/34#issuecomment-689903289, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAUU5IVWM5SZZHRW4WXOKFTSFAPJZANCNFSM4RCNOKTQ .

StellaAthena commented 3 years ago

That's good info to have. If you get annoyed with your current permission level we can kick it up but we've been putting off handing them out because we are still feeling out how organization permissions + teams work.

Assignments are designed to auto-move to "done" when you close the comment or merge the PR. That's probably what you've noticed happening.

thoppe commented 3 years ago

Bump it up only if you get bothered by my requests. I'm just trying to find the organizational structure so I don't mess up the flow that's already there.

StellaAthena commented 3 years ago

This looks like it's finished and merged @thoppe? Should it be closed?

thoppe commented 3 years ago

It is! Closing.