WolfgangFahl / ConferenceCorpus

ScientificEventCorpus
Apache License 2.0
1 stars 2 forks source link

Refactoring of openresearch datasource #48

Open tholzheim opened 2 years ago

tholzheim commented 2 years ago

Current state of affairs can be found at the branch orRefactor

Changes

openreserach.py:

Tests:

Open Questions

ccUpdate

$ ccUpdate --updateSource "orclone-backup"
Starting update of conference corpus database from orclone-backup cache ...
configureCorpusLookup callback called
Starting loading OPENRESEARCH (orclone-wikiMarkup) ...
(1/9984) Extracting Event record from AAAI HBM 2009 ... ✅
(2/9984) Extracting Event record from AASN 2008 ... ✅
(3/9984) Extracting Event record from AASNET 2008 ... ✅
(4/9984) Extracting Event record from ACE 2008 ... ✅
(5/9984) Extracting Event record from ACM SAC MCA 2009 ... ✅
(6/9984) Extracting Event record from ACM SAC OOPS 2010 ... ✅
(7/9984) Extracting Event record from ACM STC 2008 ... ✅
(8/9984) Extracting Event record from ACML 2009 ... ✅

...

(1125/1128) Extracting Event series record from UIST ... ✅
(1126/1128) Extracting Event series record from UbiComp ... ✅
(1127/1128) Extracting Event series record from VISAPP ... ✅
(1128/1128) Extracting Event series record from WPNC ... ✅
loading OPENRESEARCH (orclone-wikiMarkup) took 1371.8 s

:information_source: lookupId orclone-backup and or-backup perform the entity fetching with the WikiMarkup method to provide the latest data. The default backup path of the WikiFile (fromWikiFileManager) method is ~/or/wikibackup/<wikiId> which often resulted in loading an old backup. To avoid this the or-backup datasource should be loaded with the WikiMarkup method since it basically is the same procedure but the actuality of the records is ensured.

Downstream Problems

:information_source: The WikiMarkup method is intended to replace the WikiFile data fetching method to always provide the latest data. But the WikiFile method sould be keept if the data should be loaded from a backup.

ToDos

tholzheim commented 2 years ago

Added option to include wikiMarkup in getWikiSonFromPage