emeryberger / CSrankings

A web app for ranking computer science departments according to their research output in selective venues, and for finding active faculty across a wide range of areas.
http://csrankings.org
Other
2.69k stars 3.17k forks source link

Fetch SIGGRAPH / SIGGRAPH Asia conference track #6858

Closed musialski-research closed 2 weeks ago

musialski-research commented 8 months ago

Dear Emery

Request: SIGGRAPH/SIGGRAPH Asia Conference papers are currently not counted. Please add.

Justification: SIGGRAPH introduced a new conference track, which is not published in the TOG journal. The conference papers are top tier full papers limited to 7 pages. They undergo the same rigorous review as journal papers and are comparable to conference papers in other disciplines (CVPR, ICLR, ICML, etc.).

See description:

Xovee commented 6 months ago

Aren't these confs already included under the Interdisciplinary Areas?

yssl commented 5 months ago

SIGGRAPH / SIGGRAPH Asia are already included in the list, but I guess the problem is that CSRankings counts SIGGRAPH / SIGGRAPH Asia papers only under the "ACM Trans. Graph." title. https://github.com/emeryberger/CSrankings/blob/774a52c6e868f0cffef39ca6b639ca67ba09a469/util/regenerate_data.py#L209C37-L209C56

As przem-research said, the conference track papers are not published to TOG, so they are just listed under the conference name in DBLP, like my recent SIGGRAPH Asia conference paper: https://dblp.org/rec/conf/siggrapha/KwonGA023.html

I believe the code for handling articles requires an update.

musialski-research commented 5 months ago

@yssl Yes, this is exactly what I am saying. The conference track papers are not fetched from DBLP and hence not counted.

@Xovee : I am not saying SIGGRAPH / Asia are not included, I am saying the new track is not fetched because it does not publish in the TOG journal, instead it is published in separate proceedings.

This is more of an technical issue and @yssl correctly says a code update could fix it.

Xovee commented 5 months ago

@yssl Yes, this is exactly what I am saying. The conference track papers are not fetched from DBLP and hence not counted.

@Xovee : I am not saying SIGGRAPH / Asia are not included, I am saying the new track is not fetched because it does not publish in the TOG journal, instead it is published in separate proceedings.

This is more of an technical issue and @yssl correctly says a code update could fix it.

Thanks for the replay. Csrankings is relied on dblp's data/api structures and there seems no way to automatically address this.

yssl commented 5 months ago

@przem-research, @Xovee: After a closer examination of the code, it appears that the following line is not just counting TOG papers for SIGGRAPH / SIGGRAPH Asia, but is actually intended to count papers marked as TOG in DBLP that are from SIGGRAPH / SIGGRAPH Asia: https://github.com/emeryberger/CSrankings/blob/774a52c6e868f0cffef39ca6b639ca67ba09a469/util/regenerate_data.py#L209

According to the next line, both string 'ACM Trans. Graph.' and 'SIGGRAPH' are considered part of the 'siggraph' area, while 'ACM Trans. Graph.' and 'SIGGRAPH Asia' are both considered part of the 'siggraph-asia' area: https://github.com/emeryberger/CSrankings/blob/1e6ba81cec5962e17f923e94182e81d74c91fcf9/util/csrankings.py#L263

The area items mentioned above are used in the next line to construct the confdict, which maps the actual conference name strings to conference areas (i.e., confdict['ACM Trans. Graph.'] = 'siggraph', confdict['SIGGRAPH'] = 'siggraph', confdict['ACM Trans. Graph.'] = 'siggraph-asia', `confdict['SIGGRAPH Asia'] = 'siggraph-asia'. Maybe confdict['ACM Trans. Graph.'] is overwrited.). https://github.com/emeryberger/CSrankings/blob/774a52c6e868f0cffef39ca6b639ca67ba09a469/util/regenerate_data.py#L112

And the next line retrieves the booktitle or journal attribute from each DBLP item and save it in confname variable, https://github.com/emeryberger/CSrankings/blob/774a52c6e868f0cffef39ca6b639ca67ba09a469/util/regenerate_data.py#L182

And the next line stops processing if the confname retrieved from DBLP does not exist in the confdict, which includes only the conferences counted by csrankings. This part is likely responsible for filtering out the papers to be counted: https://github.com/emeryberger/CSrankings/blob/774a52c6e868f0cffef39ca6b639ca67ba09a469/util/regenerate_data.py#L192

In the data source used by csrankings, https://dblp.org/xml/dblp.xml.gz, TOG papers (whether they are SIGGRAPH, SIGGRAPH Asia papers, or pure TOG papers) have the following entry: <journal>ACM Trans. Graph.</journal>

SIGGRAPH Asia Conference Track papers have the following entry: <booktitle>SIGGRAPH Asia</booktitle>

And SIGGRAPH Conference Track papers have the following: <booktitle>SIGGRAPH (Conference Paper Track)</booktitle>

Therefore, because 'SIGGRAPH', not 'SIGGRAPH (Conference Paper Track)' is stored in the confdict, it appears that SIGGRAPH conference track papers are currently not being counted.

However, what I don't understand is why SIGGRAPH Asia conference track papers, which have 'SIGGRAPH Asia' as the booktitle and are also listed as 'SIGGRAPH Asia' in the confdict, are not counted by csrankings.

I would be grateful if someone who is knowledgeable about the relevant sections of the code could review and modify it to ensure that papers from the SIGGRAPH and SIGGRAPH Asia conference tracks are also counted.

musialski-research commented 2 weeks ago

Dear All, and @jrk,

This issue remains unresolved. The problem is that SIGGRAPH and SIGGRAPH Asia CONFERENCE-Track papers are not being pulled from the DBLP database, which causes them to be excluded from CSRankings.

Here’s the breakdown (with A, B, C terminology added for clarity):

Proposed Solution: Update CSRankings to properly fetch C-type papers from DBLP.

musialski-research commented 2 weeks ago

@yssl and @Xovee ,

Okay, I went through the entire code and eventually found the bug: //inproceedings[booktitle="SIGGRAPH Asia"] was missing in the filter.xq file. That caused the script, which filters the dblp.xml.gz database, to remove those proceedings from the database. The remainder of the code was fine, but it couldn't pull these proceedings from the DB because they weren't there.

The fix is easy: just add the line //inproceedings[booktitle="SIGGRAPH Asia"], to the filter.xq file (line 157). After that, you need to update the database by calling make update-dblp and remake the whole CSV files by calling make (see: https://github.com/emeryberger/CSrankings/blob/gh-pages/README.md). I guess that latter is done anyway by @emeryberger quartarly updates, so we are fine. Only thing to do is to update filter.xq.

I will create a pull request.

yssl commented 2 weeks ago

Great! Thanks, @musialski-research!