emeryberger / CSrankings

A web app for ranking computer science departments according to their research output in selective venues, and for finding active faculty across a wide range of areas.
http://csrankings.org
Other
2.65k stars 3.03k forks source link

revisit inclusion criteria for non-CS faculty #39

Closed emeryberger closed 7 years ago

emeryberger commented 7 years ago

How and should we include faculty at institutions who are not in a CS / joint EECS or similar academic unit (currently, they are not included).

moyix commented 7 years ago

If the primary goal of the ranking is to help potential PhD students figure out where they should apply, then perhaps the right thing to do is to list ECE departments separately. So there would be separate listings for (e.g.) "Boston University (CS)" and "Boston University (ECE)".

While this would "split the vote" in cases where the researchers in an area are split between ECE and CS, prospective students will have to apply to one or the other, so ranking them separately might be a good compromise.

blucia0a commented 7 years ago

We should definitely be including faculty that do work in computing, even if they are not explicitly in a computer science or EECS department.

My institution, is a case in point: I am part of CMU's College of Engineering, in its ECE department. I do computer architecture and systems work and I publish at e.g., PLDI, ISCA, ASPLOS. It's tough for me to see the value in excluding me (and my many computer systems ECE colleagues) from this ranking only because we are under "ECE" and not "CS".

As for the split (CS) vs. (ECE) distinction, I think that could be trouble. In practice, we see students that are on the HW/SW boundary co-advised, and we see many faculty with cross-department courtesy appointments. I think it would send the wrong message to incoming PhDs, potentially creating sides to take where there aren't really opposing tribes.

jeisner commented 7 years ago

I strongly agree with @blucla0a, for similar reasons. At some universities, there is a lot of advising across department boundaries, in one or many areas of CS.

Also, some schools have EECS depts. Presumably you are counting their EE faculty. If only for equity, it would be appropriate to count faculty in the EE and ECE departments elsewhere. (Similarly, I think some schools have CS+math departments.)

Now, one could argue that including non-CS faculty makes sense only at places where they're actually available as advisors to CS students. You could conceivably check that there's a CS student author on the paper. But I think it's better to err on the side of inclusiveness rather than paying too much attention to department boundaries. The fact that the faculty member is publishing in CS conferences suggests that they're relevant and are available at least for collaboration and as members of dissertation committees.

jeisner commented 7 years ago

As I understand the current design, every paper is credited to one or more faculty members.

Concern: If a prof collaborates with a computationally naive social scientist, then the CS prof will only get half credit. Is this why you currently exclude all social scientists?

Suggested solution: You could use the social scientist's other publications to determine whether he/she counts as honorary CS. In particular, it should suffice if they are the sole faculty author on at least one publication that you count -- you'd then credit them just like CS faculty for all publications on which they're an author. Reasonable hack?

Concern: Some papers may have zero faculty authors that count. Sometimes my students have published papers without me. And all of my own grad school papers were single-author. (Another possible case: a paper with multiple faculty authors, none of whom are counted as honorary CS by the heuristic above.)

Suggested solution: Do you currently drop such papers? I would credit them to the faculty members "Other @ X University" and "Other @ Y University", where X and Y are the affiliations of the authors at time of publication.

emeryberger commented 7 years ago

The credit for every paper included herein is evenly divided across all authors, regardless of status or affiliation. Credit moves with the authors. I don't see this changing.

emeryberger commented 7 years ago

From the perspective of a grad student, mixing all faculty together is at best confusing and at worst seriously misleading. After extensive consultation, I have concluded that for now, the database will be open to any full-time, tenure-track faculty member who can advise a CS PhD.

emeryberger commented 7 years ago

Note that including ECE (clearly demarcated as such) remains a possibility as well.

jeisner commented 7 years ago

After extensive consultation, I have concluded that for now, the database will be open to any full-time, tenure-track faculty member who can advise a CS PhD.

But not other faculty members who can advise a CS PhD? (Research track, emeritus, etc.?) If such researchers are indeed working with students to publish papers in top conferences, then they seem relevant to a student's grad school decision, since they are prospective advisors.

jeisner commented 7 years ago

The credit for every paper included herein is evenly divided across all authors, regardless of status or affiliation. Credit moves with the authors. I don't see this changing.

Maybe I misunderstand the design. I thought credit was only assigned to the faculty authors in the database. Are you saying that when a student graduates, the university loses (fractional) credit for the papers that student has (co-)written?

emeryberger commented 7 years ago

Emeritus advising students seems fine; non-active emeritus faculty, no. This is of course not necessarily obvious from web pages, and would need to be handled on an ad hoc basis. Non-tenure track remains off the table for now.

With respect to credit: a single faculty member gets 1/N credit for a paper, where N is the number of authors, regardless of their affiliation or status (faculty, student, or otherwise). The number never changes. A paper can count for at most 1.0, in the case that all authors are / end up becoming faculty in the database.

jeisner commented 7 years ago

Thanks for explaining. Regarding credit, aren't you worried about perverse incentives? In particular, it seems that your method gives faculty an incentive to reduce the number of collaborators who are outside the database. Thus:

This is why I assumed that the credit would be divided among only the authors who are in the database. (If some of the authors enter the database only later, then it could be re-divided.) This avoids reducing the count of high-quality CS papers produced by an institution just because non-database authors participated in the work.

emeryberger commented 7 years ago

With respect to "generous" granting of credit: here are the guidelines on what constitutes authorship from the ACM.

Anyone listed as Author on an ACM paper must meet all the following criteria:

  • they have made substantial intellectual contributions to some components of the original work described in the paper; and
  • they have participated in drafting and/or revision of the paper; and
  • they are aware that the paper has been submitted for publication; and
  • they agree to be held accountable for any issues relating to correctness or integrity of the work.

In short, adding an author cannot be an act of generosity; it is a question of acknowledging a contribution. Splitting authorship credit means that authors will be incentivized to appropriately treat authorship credit seriously.

I have given this considerable thought. There's no ideal decision, but I think the current approach is the best given the alternatives. Here are some of the numerous downsides of only including authors present in the database:

emeryberger commented 7 years ago

In any event, it always helps one's rankings in this system to produce more papers that appear at top conferences (and therefore, hopefully, constitute better work). Having more authors or authors with particular skills in general makes it easier to produce papers (not exactly more hands makes light work, but this is clearly true to some extent). In some cases, getting some work done requires working with people who will not be in the database (e.g., working with folks at Facebook or Google). Publishing such papers is always going to be more desirable than not doing so. I believe the current approach is mostly incentive aligned. I observe in passing that not splitting credit incentivizes the spurious presence of "authors" and would be a cheap way of artificially inflating the currency of papers.

jeisner commented 7 years ago

@emeryberger I do appreciate how much thought you've given this - thanks for your willingness to explain and discuss the system here.

adding an author cannot be an act of generosity; it is a question of acknowledging a contribution.

Certainly. (In my example, the generosity consisted of involving the student in the research in the first place; I presumed that they'd then do enough to earn authorship.)

Let me try making the point a different way. Under your current system, if I write a paper with student A and a different paper with student B, then my institution gets credit of 0.5+0.5 = 1.0. But if both students are involved in both papers, my institution gets only 0.33+0.33 = 0.67. Yet the same amount of work was done, and arguably the students learned even more by being involved in both projects.

not splitting credit incentivizes the spurious presence of "authors" and would be a cheap way of artificially inflating the currency of papers

I agree (and always has) that credit should be split among authors for this reason. What bothers me is when some of that split credit actually disappears! My proposal was trying to fix that by allocating the credit to the institution where the work was done or where similar work can be expected to be done in future.

I'll respond now to your concerns with my proposal, by clarifying and developing it:

  • Authorship counts would be dynamic (that is, they would change over time). When an author dies and is no longer in the database, everyone else would have to have their credit increased (talk about perverse incentives).
  • It would create a perverse incentive for senior faculty to have their junior collaborators not get tenure (since they would then likely leave the database).

Oh, my picture was that no one would ever leave the database. They continue to get credit for the work that they did as faculty. Then these concerns vanish.

The special case is that grad students are not yet in the database because they are assumed to be in training. Their share of the work accrues to the mentors and graduate program that fostered that work. A program that fosters a lot of high-quality work should get a high ranking -- even if a lot of that work is done by people who aren't in the database. (An MIT paper is still 1.0 papers even if 3 out of 4 authors are students.)

If a grad student becomes faculty, there's a question about whether their student work should stay at the mentoring institution or follow them to the new institution. There are arguments on both sides (but I'm pretty sure you don't want to count it in both places). I agree with you that probably the better answer is for it to follow the student. Why? First, it's a leading indicator of the productivity of the new institution. Second, students who make it into the database tend to be unusually good, so perhaps their departure from their alma mater actually does diminish its ability to produce good papers (just as if a faculty member had left).

(I do see counterarguments. Part of the question here is whether a school deserves credit for the work that their best graduates did there. What value did the school add -- which is presumably what rankings should judge? Did the school enable the work, or just select and attract students who were destined to do great work no matter where they went?)

It would create a disincentive for faculty to see their students get faculty appointments (since it would reduce credit).

True. In this case, however, I think there are such strong incentives in the opposite direction that I'm not too worried about this disincentive.

It would favor collaboration with industry (not in the database) over collaboration with academics.

Okay, I therefore agree that credit to industry authors should not be counted toward any university. If feasible, industry authors should really be in the database and affiliated with a company. (This would also usefully make it possible to identify prolific industry researchers as well as industry labs with a strong publishing record, if you decided to publish those numbers.)

The final case is academics who are not in the database because they're too far outside CS. I'm inclined to stick to my position that these collaborators are best treated like grad students and thus not counted in the denominator. The reason is that the CS content of a CS publication probably came mostly from the CS authors. I suspect that when a computer scientist collaborates with a sociologist on a joint project, they spend time publishing papers in both fields, and each one does more of the work on the papers in their own field. This is because the collaboration requires work from both sides. The CS work (e.g., modeling/simulation/inference) gets mainly done by the computer scientist and the details appear in the CS paper -- which is the only one you count -- while the social science work (e.g., questions, study design, data collection, answers) gets mainly done by the sociologist and the details appear in the sociology paper. (The reason that they're nonetheless both authors is that the project as a whole really was a joint effort; it's legitimate to say that it required interdisciplinary discussion and that neither paper could have happened without both authors.)

Authorship counts would be difficult to calculate (manually).

Not following this one since I don't understand the workflow beneath your system.

emeryberger commented 7 years ago

Thanks for your thoughtful response!

I do disagree with your math because contributing to "the same" paper when the work is divided among three people rather than two means strictly less work was done by each person in the former case.

As for your other suggestions: they might make sense but not in the current world. First, there is no way on earth that companies are going to give me (or anyone) their employee directories, so that is not going to happen as a practical reality. Second, including every faculty member at every university so that the database could distinguish between them and grad students is also unworkable as a practical matter -- that is, the practical matter of me as a human being managing such a gigantic database, which would need virtually full-time care.

jeisner commented 7 years ago

I do disagree with your math because contributing to "the same" paper when the work is divided among three people rather than two means strictly less work was done by each person in the former case.

Yes, of course. We both think that a paper should not count for more if it has more authors. But I am arguing that it should also not count for less. (If each of n authors gets 1/n credit, then their common institution should get 1.0 credits total.)

Let's try my example again. If I write a paper with student A and another with student B, then the total credit should be computed as Jason=1.0, A=0.5, B=0.5. If both students are involved in both papers, then the credit should be Jason=0.66, A=0.66, B=0.66. (So indeed, in the second case each student gets less credit per paper (as do I). But this is partly balanced by the fact that each student worked on two papers.) The total productivity of the institution is 2.0 both ways.

What do you think is wrong with this math? I thought we agreed on the above. The question is just what should be done with A and B's credit since they're grad students / out-of-database. Should it be zeroed out as in your current system? Or reallocated to their mentoring institution as I'd suggest?

including every faculty member at every university

You currently are trying to include all faculty in reputable CS-granting Ph.D. departments, correct? I didn't suggest adding non-CS faculty ... rather, I gave a defense of leaving them out.

First, there is no way on earth that companies are going to give me (or anyone) their employee directories,

I think you're referring to my comment that "If feasible, industry authors should really be in the database and affiliated with a company."

If not feasible, then continue leaving the industry authors out of the database, i.e., treating them like grad students. But I had in mind that perhaps eventually author affiliations could be scraped from PDF papers with reasonable accuracy. (A hard problem to be sure, but your colleague Andrew McCallum has worked on this kind of thing for 15 years -- e.g., "scoped learning" at UAI'02.)