Create list of candidate L2s

rggelles commented 1 year ago

Using full L2 list, pull out all L2s that seem like they might be plausibly of interest to anyone at CSET.

We may still extend this candidate list after discussion with LOR leads if there are things missing from the full list, but it's a place to start.

rggelles commented 1 year ago

I think a big question here, as I go through this full L2 list, is what we want the L2s to be, functionally.

The L2s from MAG are very structurally different from the L0s and L1s. The L0s and L1 are very much "fields" -- they define a topic area, like artificial intelligence, algorithms, aeronautics, civil engineering, operating systems, agronomy, or fishery. These are things there might be full college-level courses in, or journals/conferences about.

The L2s are by definition going to be narrower; that's the whole point of them. But right now, they're not just narrower. They're arguably a categorical shift, and also at least 2-3 steps narrower, not 1 step. For example, just considering AI (since we've spent a lot of time thinking about AI taxonomies), some outside taxonomies that exist for AI include https://aaai.org/conference/aaai/aaai-21/aaai21keywords/ and https://www.stateoftheart.ai/models.

In each of these, the first set of subfields for AI is some subset of fields like "computer vision," "natural language processing," "robotics," "reinforcement learning" or similar. These are all also fields that could, theoretically, be more focused college classes or conferences. Now, in this case, "computer vision" and "natural language processing" are L1s, so they've been made part of the top-level. So it probably makes sense to have our L2s represent more than just the first level down no matter what we decide.

But even then, when we consider the second level down, we're still looking at something concretely different (to me) than what the L2s represent. Here's a sample from the AAAI list: "Affective Computing," "Bayesian Learning," "Brain-Sensing and Analysis," "Knowledge Engineering," "Planning under Uncertainty," "Multi-Robot Systems," "Question Answering," "Stylistic Analysis & Text Mining."

These are still fields. In a way that most of the L2s are not. e.g. "Attribute weight" -- is it relevant to AI? yes. Does it define a field? No.

Now, do some of these fields in these taxonomies appear in the L2s? Yes, absolutely! But not all of them. So I guess my question is: should the official list of L2s be our starting point? Do they need to be for this all to work? What's the right way to start here?

@jamesdunham very interested in your thoughts here; sorry for the long-windedness.

jamesdunham commented 1 year ago

Yes, I'm 100% with you. L2+ have always struck me as more similar to keywords than fields. Often a peculiar set of keywords, moreover. They don't partition their parent field collectively, or necessarily relate to their parent field more clearly than other higher-level fields. This is part of why we aren't using them right now: not just questionable paper-field score accuracy, but the validity of the L2+ fields for analytic purposes.

So I'm in favor of not limiting ourselves to existing L2s. We could also pull in concepts from SOTA.ai, PWC, or AAAI. And Wikipedia, since this is our starting point for concept text. From the CV wiki page, for example:

Sub-domains of computer vision include scene reconstruction, object detection, event detection, activity recognition, video tracking, object recognition, 3D pose estimation, learning, indexing, motion estimation, visual servoing, 3D scene modeling, and image restoration.

Seems reasonable to make sure these are L2s under CV?

rggelles commented 1 year ago

Okay, that makes sense to me. I'm going to experiment some with crosswalking between these established hierarchies and the L2s to play with some candidate ideas for AI in particular.

May also try the "run everything through the scraper" idea to try to filter down some of these lists as discussed in #17.

Hopefully this path seems reasonable!

rggelles commented 1 year ago

Okay, update here:

There is now a list of keywords pulled from AAAI titles in the "AAAI to new categories" tab of this spreadsheet.

In this spreadsheet, the second column has my attempt to distill these keywords.

Keywords that are in Wikipedia are bold. Keywords that are near keywords in wikipedia, or are linked/redirectable headings of subsections, are italic, sometimes with alternate keywords in the columns next to them. Some alternate/additional keywords are also added not because they're the same but because they're potentially also relevant or interesting.

There are also color labels.

Green is used if a keywords is potentially/definitely not purely AI-relevant. Some of these have wikipedia pages and some don't. Whether we include these at all, and which we decide to include, is an open question that I would be interested in feedback on. However, my inclination is that if we include them we may not want them to be children of our various AI L1s but instead of other L1s, so that's a factor here.

Blue is used if a keyword is too broad or vague or cross-topic ("x and y") to really work as a single category. In most cases, I think dropping these likely makes sense, although it's worth a second look-through to make sure none are important enough we shouldn't be thinking of alternate names.

Orange is used in cases that topics have names that seem potentially relevant but no pages. These are ones we might potentially consider exploring with the "use AAAI papers as a topic descriptor" approach if we decide to try that idea out, although I'm not sure if all of them are worth doing that with.

Purple is used in cases that topics might be interesting but might also have alternate pages/names that are close enough we could potentially drop them. My suggestion in many of these cases would be to drop these in favor of their found alternate names, although double-checks would be valued here.

I'm open to either having someone check through these and narrow down now, before crosswalking to the other hierarchies, or waiting and doing the crosswalk first and the narrowing down second.

rggelles commented 1 year ago

Candidate lists now exist for both artificial intelligence and semiconductors. Given this, I'm going to close this issue and make a new issue for reviewing these candidate lists and getting outside reviews.

georgetown-cset / fields-of-study-pipeline

Create list of candidate L2s #16