Open WolfgangFahl opened 3 years ago
grep 'key="conf/ir/1980"' -A12 dblp.xml
<proceedings mdate="2018-06-23" key="conf/ir/1980">
<editor>Peter R. Wossidlo</editor>
<title>Textverarbeitung und Informatik, Fachtagung der GI, Bayreuth, Deutschland, 28.-30. Mai 1980</title>
<booktitle>Textverarbeitung und Informatik</booktitle>
<series href="db/series/ifb/index.html">Informatik-Fachberichte</series>
<volume>30</volume>
<publisher>Springer</publisher>
<year>1980</year>
<isbn>3-540-10148-9</isbn>
<url>db/conf/ir/text1980.html</url>
<ee>https://doi.org/10.1007/978-3-642-67700-7</ee>
</proceedings>
empty title for 616{'mdate': '2019-05-14', 'key': 'conf/pfe/2001', 'editor': 'Frank van der Linden 0001', 'title': None, 'booktitle': 'PFE', 'series': 'Lecture Notes in Computer Science', 'volume': '2290', 'publisher': 'Springer', 'year': '2002', 'isbn': '3-540-43659-6', 'ee': 'https://doi.org/10.1007/3-540-47833-7', 'url': 'db/conf/pfe/pfe2001.html', 'conf': 'pfe'}
empty title for 789{'mdate': '2019-01-26', 'key': 'conf/hpcasia/2019', 'title': None, 'publisher': 'ACM', 'booktitle': 'HPC Asia', 'year': '2019', 'isbn': '978-1-4503-6632-8', 'ee': 'https://dl.acm.org/citation.cfm?id=3293320', 'url': 'db/conf/hpcasia/hpcasia2019.html', 'conf': 'hpcasia'}
<proceedings mdate="2019-05-14" key="conf/pfe/2001">
<editor>Frank van der Linden 0001</editor>
<title>Software Product-Family Engineering, 4th International Workshop, PFE 2001, Bilbao, Spain, October 3-5, 2001, Revised Papers</title>
<booktitle>PFE</booktitle>
<series href="db/series/lncs/index.html">Lecture Notes in Computer Science</series>
<volume>2290</volume>
<publisher>Springer</publisher>
<year>2002</year>
<isbn>3-540-43659-6</isbn>
<ee>https://doi.org/10.1007/3-540-47833-7</ee>
<url>db/conf/pfe/pfe2001.html</url>
</proceedings>
<proceedings mdate="2019-01-26" key="conf/hpcasia/2019">
<title>Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, HPC Asia 2019, Guangzhou, China, January 14-16, 2019</title>
<publisher>ACM</publisher>
<booktitle>HPC Asia</booktitle>
<year>2019</year>
<isbn>978-1-4503-6632-8</isbn>
<ee>https://dl.acm.org/citation.cfm?id=3293320</ee>
<url>db/conf/hpcasia/hpcasia2019.html</url>
</proceedings>
see fine print in https://dblp.org/faq/16154937.html
wc -l dblp.xml
78745438 dblp.xml
see also https://bugs.launchpad.net/lxml/+bug/1742121 - sourceline 65535
sed -n '19209395,19400000p;19400000q' dblp.xml > snippet.xml
...
lxml.etree.XMLSyntaxError: Entity 'eacute' not defined, line 166, column 27
grep "é" dblp.xml | wc -l
931
shows 188 entries with empty titles