Open tholzheim opened 2 years ago
This is exactly what we'd like to achieve by reactivating the proceedings title parser again. Unfortunately for the last few weeks there has not been a single day where the nightly build would run thru and we are in a catch22 hen/egg position now which we need to get out of and avoid for the future.
sqlquery -qp ./cc.yaml -qn YearAndOrdinalColumns -f github -en cc
select year and ordinal columns
WITH tables AS (SELECT name tableName, sql
FROM sqlite_master WHERE type = 'table' AND tableName NOT LIKE 'sqlite_%')
SELECT fields.name, fields.type, tableName
FROM tables CROSS JOIN pragma_table_info(tables.tableName) fields
where name in ("year","ordinal")
order by name
name | type | tableName |
---|---|---|
ordinal | INTEGER | event_ceurws |
ordinal | INTEGER | event_tibkat |
ordinal | INTEGER | event_gnd |
ordinal | TEXT | event_wikidata |
ordinal | INTEGER | event_or |
ordinal | TEXT | event_orbackup |
ordinal | INTEGER | event_orclone |
ordinal | TEXT | event_orclonebackup |
year | INTEGER | event_confref |
year | INTEGER | event_ceurws |
year | INTEGER | event_dblp |
year | INTEGER | event_wikicfp |
year | INTEGER | event_crossref |
year | INTEGER | event_tibkat |
year | INTEGER | event_gnd |
year | INTEGER | event_wikidata |
year | INTEGER | event_or |
year | INTEGER | event_orbackup |
year | INTEGER | event_orclone |
year | INTEGER | event_orclonebackup |
ccUpdate --updateSource wikidata
Starting update of conference corpus database from wikidata cache ... Starting loading Wikidata ... loading Wikidata took 37.6 s update of conference corpus database from wikidata cacheWikidata: 8019 events 4261 eventseries took 37.6 s
{
"acronym": "ISWC 2008",
"country": "Germany",
"countryId": "Q183",
"dblpId": "conf/semweb/2008",
"describedAtUrl": null,
"doi": "10.1007/978-3-540-88564-1",
"endDate": "2008-10-30T00:00:00",
"eventId": "Q48026643",
"eventInSeries": "International Semantic Web Conference",
"eventInSeriesId": "Q6053150",
"eventTitle": null,
"followedById": null,
"gndId": "10360484-4",
"homepage": "http://iswc2008.semanticweb.org",
"language": null,
"location": "Kongresszentrum Karlsruhe",
"locationId": "Q1781594",
"mainSubject": "Semantic Web",
"ordinal": 7,
"ppn": "579171965",
"proceedings": "http://www.wikidata.org/entity/Q98093643",
"proceedingsLabel": "The Semantic Web - ISWC 2008: 7th International Semantic Web Conference, ISWC 2008, Karlsruhe, Germany, October 26-30, 2008. Proceedings",
"source": "wikidata",
"startDate": "2008-10-26T00:00:00",
"title": "The 7th International Semantic Web Conference",
"url": "http://www.wikidata.org/entity/Q48026643",
"wikiCfpId": "1974",
"year": 2008
}
name | type | tableName |
---|---|---|
ordinal | INTEGER | event_ceurws |
ordinal | INTEGER | event_tibkat |
ordinal | INTEGER | event_gnd |
ordinal | INTEGER | event_or |
ordinal | TEXT | event_orbackup |
ordinal | INTEGER | event_orclone |
ordinal | TEXT | event_orclonebackup |
ordinal | INTEGER | event_wikidata |
year | INTEGER | event_confref |
year | INTEGER | event_ceurws |
year | INTEGER | event_dblp |
year | INTEGER | event_wikicfp |
year | INTEGER | event_crossref |
year | INTEGER | event_tibkat |
year | INTEGER | event_gnd |
year | INTEGER | event_or |
year | INTEGER | event_orbackup |
year | INTEGER | event_orclone |
year | INTEGER | event_orclonebackup |
year | INTEGER | event_wikidata |
from ptp.ordinal import Ordinal
...
class CrossrefEvent(Event):
...
def postProcess(self, eventInfo:dict) -> dict:
...
Ordinal.addParsedOrdinal(rawEvent)
ccUpdate --updateSource crossref --sample "ICSE '18"
Starting update of conference corpus database from crossref cache ... Starting loading CrossRef ... read 55441 events in 2.0 s loading CrossRef took 8.0 s update of conference corpus database from crossref cachecrossref.org: 55441 events 1 eventseries took 8.0 s
{
"acronym": "ICSE '18",
"doi": "10.1145/3196478",
"endDate": null,
"eventId": "10.1145/3196478",
"location": "Gothenburg Sweden",
"lookupAcronym": null,
"month": null,
"name": "ICSE '18: 40th International Conference on Software Engineering",
"number": null,
"ordinal": 4,
"source": "crossref",
"sponsor": "SIGSOFT ACM Special Interest Group on Software Engineering\u21f9IEEE-CS Computer Society",
"startDate": null,
"theme": null,
"title": "Proceedings of the 4th International Workshop on Software Engineering for Smart Cyber-Physical Systems",
"url": "https://api.crossref.org/v1/works/10.1145/3196478",
"year": null
}
name | type | tableName |
---|---|---|
ordinal | INTEGER | event_ceurws |
ordinal | INTEGER | event_tibkat |
ordinal | INTEGER | event_gnd |
ordinal | INTEGER | event_or |
ordinal | TEXT | event_orbackup |
ordinal | INTEGER | event_orclone |
ordinal | TEXT | event_orclonebackup |
ordinal | INTEGER | event_wikidata |
ordinal | INTEGER | event_crossref |
year | INTEGER | event_confref |
year | INTEGER | event_ceurws |
year | INTEGER | event_dblp |
year | INTEGER | event_wikicfp |
year | INTEGER | event_tibkat |
year | INTEGER | event_gnd |
year | INTEGER | event_or |
year | INTEGER | event_orbackup |
year | INTEGER | event_orclone |
year | INTEGER | event_orclonebackup |
year | INTEGER | event_wikidata |
year | INTEGER | event_crossref |
Updated post processing of extracted LoDs to convert ordinals to int: https://github.com/WolfgangFahl/ConferenceCorpus/blob/dc0f19b004f6b435d2458a7792aa2fafeaf9987c/corpus/datasources/openresearch.py#L357
name | type | tableName |
---|---|---|
ordinal | INTEGER | event_ceurws |
ordinal | INTEGER | event_tibkat |
ordinal | INTEGER | event_gnd |
ordinal | INTEGER | event_or |
ordinal | INTEGER | event_orbackup |
ordinal | INTEGER | event_orclone |
ordinal | INTEGER | event_orclonebackup |
ordinal | INTEGER | event_wikidata |
ordinal | INTEGER | event_crossref |
year | INTEGER | event_confref |
year | INTEGER | event_ceurws |
year | INTEGER | event_dblp |
year | INTEGER | event_wikicfp |
year | INTEGER | event_tibkat |
year | INTEGER | event_gnd |
year | INTEGER | event_or |
year | INTEGER | event_orbackup |
year | INTEGER | event_orclone |
year | INTEGER | event_orclonebackup |
year | INTEGER | event_wikidata |
year | INTEGER | event_crossref |
from ptp.ordinal import Ordinal
...
class DblpEvent(Event):
...
@staticmethod
def postProcessLodRecord(rawEvent:dict):
...
Ordinal.addParsedOrdinal(rawEvent)
``python
```bash
ccUpdate --updateSource dblp
Starting update of conference corpus database from dblp cache ... configureCorpusLookup callback called Starting loading dblp computer science bibliography ... Warning - using full /home/wf/.dblp/dblp.xml dataset ~9.1m records! Warning - using full /home/wf/.dblp/dblp.xml dataset ~9.1m records! loading dblp computer science bibliography took 7.5 s update of conference corpus database from dblp cachedblp: 50248 events 5454 eventseries took 7.6 s
{
"acronym": "ISWC 2008",
"booktitle": "ISWC",
"doi": null,
"ee": "https://ieeexplore.ieee.org/xpl/conhome/4840596/proceeding,http://www.computer.org/csdl/proceedings/iswc/2008/2637/00/index.html",
"endDate": null,
"eventId": "conf/iswc/2008",
"isbn": "978-1-4244-2637-9",
"mdate": "2019-10-16",
"ordinal": 12,
"publicationSeries": null,
"series": "iswc",
"source": "dblp",
"startDate": null,
"title": "12th IEEE International Symposium on Wearable Computers (ISWC 2008), September 28 - October 1, 2008, Pittsburgh, PA, USA",
"url": "https://dblp.org/db/conf/iswc/iswc2008.html",
"year": 2008
}
ccUpdate --updateSource wikicfp
Starting update of conference corpus database from wikicfp cache ... configureCorpusLookup callback called Starting loading WikiCFP ... loading WikiCFP took 13.3 s update of conference corpus database from wikicfp cacheWikiCFP: 90339 events 6019 eventseries took 13.3 s
{
"Final_Version_Due": null,
"Notification_Due": null,
"Submission_Deadline": "2008-05-16T00:00:00",
"acronym": "ISWC 2008",
"deleted": false,
"endDate": "2008-10-30T00:00:00",
"eventId": "1974",
"eventType": "Conference",
"locality": "Karlsruhe, Germany",
"ordinal": null,
"series": "International Semantic Web Conference",
"seriesId": "1769",
"source": "wikicfp",
"startDate": "2008-10-26T00:00:00",
"title": "ISWC 2008 : International Semantic Web Conference",
"url": "http://www.wikicfp.com/cfp/servlet/event.showcfp?eventid=1974",
"wikiCfpId": 1974,
"year": 2008
}
The event and eventseries view is generated over the set of common properties of each datasource. Unfortunately for event the only property that is present in all datasource tables is
source
and the event series have no common property. Thus, the resulting views break some of the existing tests e.g. they rely on the existence of eventId as column in the event view.Tested with:
Returns
For example the problem occurs in the following function (if tested on the generated view) https://github.com/WolfgangFahl/ConferenceCorpus/blob/531122fd4ae15f84772d44c2116794b7ea01740d/tests/testCorpusLookup.py#L88 The MultiQuery on the event view uses the source and eventId. but with the generated view the eventId is not in the view.