cldf / pycldf

python package to read and write CLDF datasets
https://cldf.clld.org
Apache License 2.0
15 stars 7 forks source link

cldf createdb fails for source columns with different capitalisations #100

Closed chrzyki closed 4 years ago

chrzyki commented 4 years ago

E.g. https://github.com/lexibank/sagartst/blob/master/cldf/sources.bib has address and Address, cldf/csvw and/or cldf/pycldf treat this as as different columns but then try to create an address column twice, making cldf createdb fail:

Traceback (most recent call last):
  File "/bin/cldf", line 8, in <module>
    sys.exit(main())
  File "/lib/python3.7/site-packages/pycldf/__main__.py", line 26, in main
    return args.main(args) or 0
  File "/lib/python3.7/site-packages/pycldf/commands/createdb.py", line 17, in run
    db.write_from_tg()
  File "/lib/python3.7/site-packages/pycldf/db.py", line 182, in write_from_tg
    return self.write(_force=_force, _exists_ok=_exists_ok, **items)
  File "/lib/python3.7/site-packages/pycldf/db.py", line 170, in write
    return csvw.db.Database.write(self, _force=False, _exists_ok=False, **items)
  File "/lib/python3.7/site-packages/csvw/db.py", line 409, in write
    db.execute(table.sql(translate=self.translate))
sqlite3.OperationalError: duplicate column name: Address

Likewise for all other bib information, e.g. Journal and journal etc.

xrotwang commented 4 years ago

Hm. I guess that's a limitation of sql we have to work around. In the case of sources, I'd say we simply force lowercase - which is also what some bibtex processors do.

For other attributes I'm not sure. Warn and force lowercase? Or raise an exception?

Oracle had a setting to make column names case sensitive, but that went against all sorts of other assumptions in sql.

Christoph Rzymski notifications@github.com schrieb am Do., 23. Jan. 2020, 10:21:

E.g. https://github.com/lexibank/sagartst/blob/master/cldf/sources.bib has address and Address, cldf/csvw and/or cldf/pycldf treat this as as different columns but then try to create an address column twice, making cldf createdb fail:

Traceback (most recent call last): File "/bin/cldf", line 8, in sys.exit(main()) File "/lib/python3.7/site-packages/pycldf/main.py", line 26, in main return args.main(args) or 0 File "/lib/python3.7/site-packages/pycldf/commands/createdb.py", line 17, in run db.write_from_tg() File "/lib/python3.7/site-packages/pycldf/db.py", line 182, in write_from_tg return self.write(_force=_force, _exists_ok=_exists_ok, items) File "/lib/python3.7/site-packages/pycldf/db.py", line 170, in write return csvw.db.Database.write(self, _force=False, _exists_ok=False, items) File "/lib/python3.7/site-packages/csvw/db.py", line 409, in write db.execute(table.sql(translate=self.translate)) sqlite3.OperationalError: duplicate column name: Address

Likewise for all other bib information, e.g. Journal and journal etc.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cldf/pycldf/issues/100?email_source=notifications&email_token=AAGUOKAXDYYGX2LPR4WPA53Q7FOTBA5CNFSM4KKTFHV2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IIF4KXA, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGUOKGRHW7JEK6RKAUEO73Q7FOTBANCNFSM4KKTFHVQ .

SimonGreenhill commented 4 years ago

The bibtex spec says that tags are not case sensitive: http://www.bibtex.org/Format/, so address and Address and AdDrESs are equivalent. In this case, forcing lowercase is the best option.

chrzyki commented 4 years ago

Sounds good. I'd just vote for lowercasing everything in the bib tags itself.

chrzyki commented 4 years ago

Done in #101