SmartDataAnalytics / OpenResearch

Public issue system for OPENRESEARCH/ConfIDent
8 stars 3 forks source link

Text Values in Ordinal Field #119

Open MusaabKh opened 3 years ago

MusaabKh commented 3 years ago

Ordinal Field should be strictly numeric but there are text values in it for example 1st.

WolfgangFahl commented 3 years ago

See https://www.openresearch.org/wiki/Property:Ordinal reported by Heike Rohde

WolfgangFahl commented 3 years ago

http://ptp.bitplan.com has the functionality to resolve this.

WolfgangFahl commented 3 years ago

https://github.com/WolfgangFahl/ProceedingsTitleParser/blob/7e52b4e3eae09269464669fe387425b9f6392952/ptp/titleparser.py#L488 has an ordinal generator which is used for ordinal lookup in https://github.com/WolfgangFahl/ProceedingsTitleParser/blob/master/dictionary.yaml

It was a primitive approach i used for a first step but it works nicely ...

51.':
  type: enum
  value: 51
51st:
  type: enum
  value: 51
LI.:
  type: enum
  value: 51

...

the roman and other values are also there ...

def addEnums(self):
        ''' add enumerations '''
        # https://stackoverflow.com/a/20007730/1497139
        ordinal = lambda n: "%d%s" % (n,"tsnrhtdd"[(n/10%10!=1)*(n%10<4)*n%10::4])
        for i in range(1,100):
            roman=self.toRoman(i)+"."
            self.add("%d." % i,'enum',i)
            self.add(ordinal(i),'enum',i)
            self.add(roman,'enum',i)
            ordinal4i=num2words(i, to='ordinal')
            self.add(ordinal4i,'enum',i)
            title=ordinal4i.title()
            self.add(title,'enum',i)
tholzheim commented 3 years ago

This can be solved by a wikiedit command

wikiedit -t wikiId -q "[[isA::Event]]" --search "(\|Ordinal=[0-9]+)(?:st|nd|rd|th)\b" --replace "\1"

To run this command effectively for all events add -qd 1000 -f

Explaination:

We create a capture group for |Ordinal=[0-9]+ which is then used in the replace statement to replace the whole match.

Example:

{{Event
|Acronym=ADAPTIVE 2020
|Title=Twelfth International Conference on Adaptive and Self-Adaptive Systems and Applications
|Ordinal=12th
|Series=ADAPTIVE
|Event type=Conference
|Start date=2020/10/25
|End date=2020/10/29
|Homepage=http://www.iaria.org/conferences2020/CfPADAPTIVE20.html
|City=Nice
|Country=France
}}
WolfgangFahl commented 3 years ago

There is a CI regression in the nightly Travis build:

ERROR: testIssue119 (tests.testDataFixes.TestDataFixes)
test for fixing Ordinals not a number
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/hd/luxio/var/lib/jenkins/jobs/OpenResearch migration/workspace/code/migration/tests/testDataFixes.py", line 101, in testIssue119
    lookup_dict = Dictionary('../dataset/dictionary.yaml' if self.inPublicCI() else '../../dataset/dictionary.yaml')
  File "/hd/luxio/var/lib/jenkins/jobs/OpenResearch migration/workspace/code/migration/migrate/Dictionary.py", line 9, in __init__
    self.read(yamlPath)
  File "/hd/luxio/var/lib/jenkins/jobs/OpenResearch migration/workspace/code/migration/migrate/Dictionary.py", line 39, in read
    with open(yamlPath, 'r') as stream:
FileNotFoundError: [Errno 2] No such file or directory: '../../dataset/dictionary.yaml'

Do not rely on the current directory - calculate the path from the script path ...

WolfgangFahl commented 3 years ago

See https://www.openresearch.org/wiki/Special:Contributions/Caitlin for an example of such invalid entries

tholzheim commented 2 years ago

Updated 216 ordinal entries. Property:Ordinal now shows no improper assignments

Update Report: issue_119_fix_ordinal_update_output.txt