emeryberger / CSrankings

A web app for ranking computer science departments according to their research output in selective venues, and for finding active faculty across a wide range of areas.
http://csrankings.org
Other
2.74k stars 3.26k forks source link

Update inclusion criteria #480

Closed emeryberger closed 5 years ago

emeryberger commented 7 years ago

Pulling out this as an issue from a PR by @deniszorin (https://github.com/emeryberger/CSrankings/pull/470). For reference, current inclusion criteria are partially spelled out here: https://github.com/emeryberger/CSrankings/blob/gh-pages/docs/CONTRIBUTING.md.


from @deniszorin:

CSrankings aims to "count the right beans" and  have an entirely transparent system. Two main data sources are used. One is dblp, the other (faculty lists) originally came from Brown http://cs.brown.edu/people/alexpap/faculty_dataset.html, which was created based on information on the Web, collected through mechanical Turk, and then further corrected by whoever volunteered to fix it. The Brown criterion for inclusion was different from the current one. 
Dblp data is entirely based on open information, the second (faculty list) was initially the same way, but the current 
definition requires access to non-open data for verification. In a non-scientific random sample of people with complicated 
arrangements, which may not meet the criterion, I see most included.

Let me make a general proposal: use wikipedia verification approach accepting/rejecting any changes https://en.wikipedia.org/wiki/Wikipedia:Verifiability I.e., any change request comes with a reference link meeting a standard (I propose one below). Imperfect, but works reasonably well on much larger scale, and keeps the whole system transparent (i.e., anyone with internet access can check everything). This implies that the criterion itself should be formulated in a way that it is verifiable by open information on the Web.

Consider the current criterion: full time, tenure-track, can advise CS PhD students. With my administrative hat on, "full-time" is an ambiguous term. The elaboration in this thread, "working 75% of the year" remains unclear, and in any interpretation is not verifiable from open sources, for example: -- literal interpretation: 75% of time spend doing actual work. impossible to measure (and certainly not verifiable). -- salary-based: for what fraction of time a faculty member is paid. Precise, but not verifiable from open sources and unstable: sabbaticals, leave, consulting may result in salary reduction for a fixed period. -- contract-based: what the appointment letter says. This is also precise, but not verifiable. However, it has a high correlation with the title the listings on department web pages.

Proposed criterion: A faculty member is included if he or she holds a regular faculty position at the university (for US: Prof./Associate Prof./Assistant Prof., if other countries are included this needs expansion), or an equivalent title (e.g., Principal Research Scientist at MIT). and can advise CS PhD students. (Whether to restrict this to tenured faculty or include research faculty may be another debate). Acceptable verification: a link to a faculty list with titles, maintained by a CS department (not a personal page), showing the person's title. In the case of unconventional titles, a link to a university policy indicating that the title is substantially similar to a faculty appointment (can advise PhD students/apply for grants). For non-CS faculty advising CS PhD students, a link to at least one CS thesis showing that this person was an advisor. This criterion will minimize corrections that need to be made to the Brown database, as it seems to be more consistent with the way it was constructed. Note that it does not preclude multiple affiliations if institutions allow someone to be regular faculty in two places.

Multiple affiliations. Some people have genuine multiple affiliations. It is less usual in the US (but does happen: e.g., https://people.eecs.berkeley.edu/~yelick/). These are more common and often complicated in other countries. Fractions of an appointment cannot generally be obtained from open sources. The only realistic way I see is to 
assign, by default, all papers to the first listed appointment. There should be a correction mechanism. I propose to allow a faculty member with multiple affiliations, to define what is his/her primary affiliation for research purposes, by creating an open reference for this, e.g., by publishing, in any more or less permanent way, a statement describing what institution most of his research is connected to. A better approach, requiring changes to the software, is to allow a faculty member to specify percentages of commitment (using a published statement).

fycus-tree commented 7 years ago

I am in support of this, as challenges in the term "full-time" may remove a host of current, high-profile professors as explained in my comment https://github.com/emeryberger/CSrankings/commit/5c60309807d9d8b4939cadce1a3c3a2b88f17bd5

The simple criteria--of having a CS department list someone as a faculty member that can advise students--seems cleaner and better. There's a question as to "adjunct" faculty and what to be done about "emeritus" faculty; both can sometimes be part of a thesis committee. I'd suggest those qualifiers would preclude inclusion.

As for multiple affiliations, https://am.is.tuebingen.mpg.de/person/sschaal seems like a challenging case. He's a USC professor, and an MPI Director. But his USC PhD students are listed as employees of MPI. He has 21 PhD students total, of which only 4 have a usc.edu emails. Thus it seems most of his funding/support/mentorship is at MPI, though the title of "Professor" is only from USC.

brendano commented 7 years ago

I do think there is some attraction to the idea of simply going by the title on the department's homepage, just so you don't have to make these decisions or have to extract the information about them. Like, what affiliations are real and what's just on paper? Obviously if you had the information that would be great but it seems hard and not always desirable to police the internal operations of every lab/department/employment situation.

brendano commented 7 years ago

BTW, an argument against what I just advocated: there are certainly cases where universities game their pubs/citations through spurious affiliations. There were a number of articles about King Abdulaziz University about this a few years ago, in the area of mathematics:

https://liorpachter.wordpress.com/2014/10/31/to-some-a-citation-is-worth-3-per-year/ http://www.dailycal.org/2014/12/05/citations-sale/

(In the latest USNWR rankings they're ranked 13th in the world in computer science.)

fycus-tree commented 7 years ago

@brendano @emeryberger

I think the criteria for verification, as originally proposed by @emeryberger above, is fine. It would ignore the KAU-style gaming of the system. The primary action for this criterion is seems to be something along the lines of "Can this professor be the primary advisor for a CS PhD student attending this institution?". A simple check is, for the publications they're getting credit for, are any of the authors solely affiliated with said institution (as would be the case for a student). Being the chair of a thesis committee would be sufficient evidence. For new faculty, having students/professors list each other as advisers and advisees on websites is likely sufficient. In cases of multiple affiliation, the institution where they're advising more students would get credit.

I think, if the faculty is listed as adjunct, you should require additional information. In this case, the article states that KAU employs (as adjunct faculty). Looking at KAU's Math Faculty, these affiliations aren't listed. Also, all of these affiliations are secondary affiliations. There's clearly not enough evidence of PhD mentoring to support listing them.

srmadden commented 7 years ago

I'd like to argue for a criteria based on primary organization and the ability to supervise students. At MIT, there are very few Adjuncts, and for us, Adjunct is reserved for distinguished faculty who are not a part of a regular tenure track system. As a result are only 3 Adjuncts in EECS department that I know of, and all are very active and very prominent. For example Mike Stonebraker is an Adjunct, and he Mike is definitely super-engaged at MIT, having graduating 3-4 Ph.D students in past 5 years and actively supervising a number of current students.

Specifically, at MIT what adjunct means is

1) Not tenure track or tenured 2) Is able to supervise PhD students

There are other titles, "Professor of the Practice" at MIT, or research professor at other organizations, that have similar properties (i.e., not tenure track, can supervise Ph.Ds).

As a non-MIT example, Trevor Darrel is currently included in the accounting for UC Berkeley; his official EECS web page which lists his status as "Professor in Residence” — you’ll have to ask UC Berkely what that means but I suspect it is not a traditional academic rank.

https://www2.eecs.berkeley.edu/Faculty/Homepages/darrell.html

I would propose a simple guidelines, which is to include any researcher at a university who is able to supervise Ph.D. students, and who has a primary affiliation with that university. I think this is more fair and better accounts for the vagaries of academic appointments especially when it comes to very distinguished researchers (like Mike and Trevor) who are in fact actively contributing to academic programs but seem to be being counted (somewhat arbitrarily) in different ways.

fycus-tree commented 7 years ago

@srmadden I think the proposal is to make something that's easy to check and validate.

faculty member is included if he or she holds a regular faculty position at the university (for US: Prof./Associate Prof./Assistant Prof., if other countries are included this needs expansion), or an equivalent title (e.g., Principal Research Scientist at MIT). and can advise CS PhD students. (Whether to restrict this to tenured faculty or include research faculty may be another debate). Acceptable verification: a link to a faculty list with titles, maintained by a CS department (not a personal page), showing the person's title.

I think because there's so much variation in how adjunct is used (many universities use it to mean "part-time teaching staff") , I'd propose the following extension

If a faculty member is listed as adjunct faculty, additional verification should be provided (such as examples of being the primary advisor on a PhD student thesis, or papers published with students at said institution)

For both Trevor and Mike, this evidence was trivial for me to find (I even got 2017 theses)

srmadden commented 7 years ago

This sounds like a very reasonable approach to me.

lorenmt commented 7 years ago

If CSRankings' objective is to provide a good reference to the prospective graduate student, my simple suggestion is to only look for the individual research group (or larger the research department) which is considered to be suitable for the CSRankings.

As one of the prospective graduate students, I strongly believe getting a so-called computer science degree is not that important. The key is to find a good research lab and work with some specific PIs who hold similar research interests.

As so, from other PRs, I suggest adding any research professors which can advise students and any research groups which have main research themes as listed areas in CSR.

Doing so is extremely helpful for students, and the ranking will be not biased, fair and inclusive.

fycus-tree commented 7 years ago

@lorenmt I think the goal of this issue is to make the process ... more simple and verifiable. However, it does feel at odds in some way, listing "university-level" rankings but a "department-level" selection criteria. Additionally, #604 has highlighted that sometimes CSRankings research fields (AI, Robotics, Vision) are not organized formally into the Computer Science Department at universities.

So here's some ideas

  1. Accept faculty from any department to the list, as long as they are primary advisors for PhD students With the current metric of aggregation, there is no downside to having "too many faculty" (since total contributions are counted). As the measurement is publications in CS conferences, it will only reward a university's research contributions in CS.
  2. Accept CS faculty, allow others on basis of advising CS students This is the current proposal for inclusion criteria. This way, a CS department's website can be used to verify most afflictions. And if a PR can point to advised CS PhD students (in the form of a formal thesis ideally), others can also be added sensibly
  3. Accept CS-like faculty, allow others on basis of advising CS-like students If a department's primary focus is one-or-more of the CS Rankings "research categories", accept their faculty as CS-like. This simply highlights that sometimes "Computer Science" is simply not the term-of-choice for these fields of study; especially in Europe. Oxford, Cambridge and even TUM seem to use "Information Engineering" or "Informatics" to do AI, ML, Vision, Robotics research categorically (e.g. these are not done by "CS" faculty). Also see my comment documenting the history of the term "computer and information sciences"

As much as I wanted to pitch something like "Accept professors on the basis on their publishing area and focus" (which also seems sensible), I think that's functionally identical to (1), as manually verifying individual faculty's publishing areas is probably out-of-scope for verification.

andrewcmyers commented 6 years ago

I don't see that there is much of an incentive for universities to game the system by including faculty inappropriately, so I would aim for simple rules that err on the side of inclusivity. In particular, I would list anyone who is allowed to be the graduate advisor of a CS PhD student. (This is more inclusive than proposal 1 above.)

The possible issue I see with that approach is that at some universities (e.g., Cornell, CMU, Gatech), what other places consider "CS" is divided up into smaller units. For example, Cornell has an Information Science department whose faculty publish heavily in CS venues. In Cornell's case, the problem is largely though not perfectly solved by Cornell's field system — those faculty are members of the CS field and can advise both IS and CS students. I don't know the situation at other universities well enough to comment on how the proposed rule would break things.

fycus-tree commented 6 years ago

@andrewcmyers Do you see any issue with changing "advise CS PhD students" to "advise PhD students"? That would alleviate all the department-level discussion, and since the rankings are still based on publications in CS conferences, it would still capture CS rankings?

seongminjeon commented 6 years ago

Any update?

andrewcmyers commented 6 years ago

It seems to me that the question is whether multiple filters for "CS"-ness are needed. One possibly problematic situation would be if there are conferences in the list that are arguably CS + X where X is really outside of CS. Then you could imagine an "X" department member advising "X" students and getting listed even though their work is not really CS at all. For example, I suspect some people might consider certain kinds of robotics or scientific computing work to not really be "CS". I don't know whether this is a real problem.

fycus-tree commented 5 years ago

@andrewcmyers There are some areas that would experience expansion as they're interdisciplinary. Robotics is already a subcategory in CSRankings (although a lot of Mechanical Engineers are often involved). Comp. Bio is a subcategory (biologists are often involved). Economics is already a subcategory. Machine Learning exists (but not statistics). There are also a lot of ECE contributors in communication/wireless/networking/distributed areas.

The current criteria basically encourage collaborative departments that give CS faculty affiliate status to as many people as possible. With CS interest blossoming, it seems CS-like research is going to grow to encompass more fields "eat the world" as some say. For example, some writers suggest Physics research is becoming more computational than theoretical (another example).

@emeryberger Maybe CSRankings shouldn't really take a hard look along departmental lines, as giving CS PhD advising rights (but not CS PhD admittance rights, per some CMU examples #1424) is increasingly more common. It's not just CMU. MIT EECS says The EECS Department permits any faculty member at MIT to supervise research that will be used for a Master’s thesis or for the Ph.D.thesis. Stanford EE PhD allows your advisor to be faculty from ANY department. Stanford CS PhD links to the PhD qualifier form and requirements, both which suggest your adviser can be from any department. That implies that ALL MIT and Stanford faculty are eligible for CSRankings inclusion under these criteria.

Since 2012, Stanford has CS as the most popular major, and sees >90% of its undergrads take at least one CS class. MIT is seeing that "about 40 percent of MIT undergraduates now major either in CS or in joint programs combining CS". CS is proliferating and touching almost all STEM research areas. Maybe restricting by publishing areas (and not formal departmental affiliation) is the best path forward, as following specific university-by-university administrative minutia seems like a bad idea.

github-actions[bot] commented 5 years ago

Stale issue message