dsg-uwaterloo / publications

Publications of the Data Systems Group
https://uwaterloo.ca/data-systems-group/publications
3 stars 0 forks source link

Mangled accents, bibtex proceedings oddity #10

Open lintool opened 6 years ago

lintool commented 6 years ago

Crawler seems to be mangling accents:

@inproceedings{Begoli_Camacho-Rodriguez_Hyde_Mior_Lemire_2018a, title={Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources.}, DOI={http://doi.acm.org/10.1145/3183713.3190662}, booktitle={SIGMOD Conference}, author={Begoli, Edmon and Camacho-Rodríguez, Jesús and Hyde, Julian and Mior, Michael and Lemire, Daniel}, year={2018}, pages={221–230}}

Also: https://dblp.uni-trier.de/rec/bibtex/conf/sigmod/BegoliCHML18

booktitle says:

  booktitle = {Proceedings of the 2018 International Conference on Management of
               Data, {SIGMOD} Conference 2018, Houston, TX, USA, June 10-15, 2018},

Why is it "SIGMOD Conference" above?

Also - why can't we just crawl the bibtex here? https://dblp.uni-trier.de/rec/bibtex/conf/sigmod/BegoliCHML18

michaelmior commented 6 years ago

I could separately crawl the BibTeX I suppose. It was just much easier to use the data I had already pulled.

michaelmior commented 6 years ago

Crawling the BibTeX and deduplicating is probably a better solution, but I pushed this to fix the accents. I'll try to take a look at the other oddity later.

michaelmior commented 6 years ago

For the proceedings oddity, it looks like "SIGMOD Conference" is the only useful thing DBLP returns from its XML API. Unfortunately, this means significant rewriting to fix which is out of scope for me right now.