WolfgangFahl / ConferenceCorpus

ScientificEventCorpus
Apache License 2.0
1 stars 2 forks source link

Event Series Order Synchronization #52

Open tholzheim opened 2 years ago

tholzheim commented 2 years ago

Event Series order synchronization is an algorithm/filter that will make sure that the matching of events from different sources by ordinal/year will be simplified.

Steps:

  1. Make sure that there is only one event per ordinal and year and source e.g. by selecting the first of multiple volumes.
    1. Heuristics are required and need to be find out be examples
  2. Find slots for event series completion (sorted list of year and ordinal pairs)
  3. Analyze result for consistency see #50
SELECT year,ordinal,event,title,location,city,cityWikidataid,countryWikiDataid,country,dates,doi,isbn13,ppn
FROM "event_tibkat"
WHERE title like "%COLING%"
AND title like "%proceedings%"
AND NOT title like "%orkshop%"
AND NOT title like "%demonstration%"
AND NOT title like "%Tutorial abstracts%"
AND NOT title like "%Industry track%"
AND NOT title like "%sessions"
AND NOT title like "%Vol. 2"
AND NOT title like "%Vol. 3"
AND NOT title like "%Vol. 4"
AND NOT title like "%Vol. 5"
AND NOT title like "%Vol. 6"
order by year
tholzheim commented 2 years ago

See https://rq.bitplan.com/index.php/Workdocumentation_2022-05-23_TH#Extracting_volume_information for volume number distribution

WolfgangFahl commented 2 years ago

Very interesting side effect result