EleutherAI / project-menu

See the issue board for the current status of active and prospective projects!
65 stars 4 forks source link

[RFP] Set the record straight on environmental impacts of NLP #34

Closed StellaAthena closed 2 years ago

StellaAthena commented 2 years ago

Background

There's a lot of disinfo on the internet about the energy and environmental costs of large language models. It would be nice to be able to set the record straight.

What to plot?

It's not really a plot per se, it's more a table. I need a table of all papers published in *CL venues in 2019, 2020, and 2021. Each datum should read (paper name, venue name, year, type of paper) where type of paper is "long paper" "short paper" or similar. Sometimes it will be none.

The data can be found here, I just need it scraped and stored in a csv. Once this list is created, I will sample papers from it and gather people to record relevant data about the training process.

Related Papers/Frameworks

None

mgobrain commented 2 years ago

@StellaAthena how does this look for a first pass? I wasn't sure how to classify long vs short so I just returned page length. papers_2019_to_2021.csv

StellaAthena commented 2 years ago

@mgobrain Oh shit I forgot about this. Thanks for reminding me!

I actually found time to scrape it myself, but unfortunately it seems that very few (<10%) papers actually contain info about how long they ran GPUs for >.>