CSVW.to_json combines paths for filenames when opening paths inside a folder

LinguList commented 2 years ago

I just worked on the file lexibank/crossandean/ where we have multiple orthography profiles, and tried to load it with CSVW, which works, but since I loaded not inside the same folder, the to_json command seems to combine the relative path with another path (?):

>>> t = CSVW("etc/orthography/Atalla.tsv")
>>> t.to_json()
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Input In [13], in <cell line: 1>()
----> 1 t.to_json()

File ~/projects/scripts/csvw/src/csvw/metadata.py:1714, in CSVW.to_json(self, minimal)
   1712 if self.t.common_props and not isinstance(self.t, Table):
   1713     res.update(jsonld.to_json(self.t.common_props, flatten_list=True))
-> 1714 res['tables'] = [
   1715     self._table_to_json(table) for table in self.tables if not table.suppressOutput]
   1716 if minimal:
   1717     return list(
   1718         itertools.chain(*[[r['describes'][0] for r in t['row']] for t in res['tables']]))

File ~/projects/scripts/csvw/src/csvw/metadata.py:1715, in <listcomp>(.0)
   1712 if self.t.common_props and not isinstance(self.t, Table):
   1713     res.update(jsonld.to_json(self.t.common_props, flatten_list=True))
   1714 res['tables'] = [
-> 1715     self._table_to_json(table) for table in self.tables if not table.suppressOutput]
   1716 if minimal:
   1717     return list(
   1718         itertools.chain(*[[r['describes'][0] for r in t['row']] for t in res['tables']]))

File ~/projects/scripts/csvw/src/csvw/metadata.py:1739, in CSVW._table_to_json(self, table)
   1736     col.propertyUrl = col.inherit('propertyUrl')
   1737     col.valueUrl = col.inherit('valueUrl')
-> 1739 row = [
   1740     self._row_to_json(table, cols, row, rownum, rowsourcenum)
   1741     for rownum, (_, rowsourcenum, row) in enumerate(
   1742         table.iterdicts(with_metadata=True, strict=False), start=1)
   1743 ]
   1744 if table._comments:
   1745     res['rdfs:comment'] = [c[1] for c in table._comments]

File ~/projects/scripts/csvw/src/csvw/metadata.py:1739, in <listcomp>(.0)
   1736     col.propertyUrl = col.inherit('propertyUrl')
   1737     col.valueUrl = col.inherit('valueUrl')
-> 1739 row = [
   1740     self._row_to_json(table, cols, row, rownum, rowsourcenum)
   1741     for rownum, (_, rowsourcenum, row) in enumerate(
   1742         table.iterdicts(with_metadata=True, strict=False), start=1)
   1743 ]
   1744 if table._comments:
   1745     res['rdfs:comment'] = [c[1] for c in table._comments]

File ~/projects/scripts/csvw/src/csvw/metadata.py:1291, in Table.iterdicts(self, log, with_metadata, fname, _Row, strict)
   1286             zipf = stack.enter_context(zipfile.ZipFile(str(zipfname)))
   1287             handle = io.TextIOWrapper(
   1288                 zipf.open([n for n in zipf.namelist() if n.endswith(fpath.name)][0]),
   1289                 encoding=dialect.encoding)
-> 1291 reader = stack.enter_context(UnicodeReaderWithLineNumber(handle, dialect=dialect))
   1292 reader = iter(reader)
   1294 # If the data file has a header row, this row overrides the header as
   1295 # specified in the metadata.

File /usr/lib/python3.9/contextlib.py:448, in _BaseExitStack.enter_context(self, cm)
    446 _cm_type = type(cm)
    447 _exit = _cm_type.__exit__
--> 448 result = _cm_type.__enter__(cm)
    449 self._push_cm_exit(cm, _exit)
    450 return result

File ~/projects/scripts/csvw/src/csvw/dsv.py:182, in UnicodeReader.__enter__(self)
    179     if isinstance(self.f, pathlib.Path):
    180         self.f = str(self.f)
--> 182     self.f = io.open(self.f, mode='rt', encoding=self.encoding, newline=self.newline or '')
    183     self._close = True
    184 elif not hasattr(self.f, 'read'):

FileNotFoundError: [Errno 2] No such file or directory: 'etc/orthography/etc/orthography/Atalla.tsv'

So my interpretation of this error is that the path etc/ortyhography/ is duplicated for some reason?

I do not know if this is intended behavior, that's why I thought it is worthwhile posting it.

xrotwang commented 2 years ago

Yeah, this dual nature of URLs - "real" HTTP URLs vs. paths in the file system - is difficult, in particular when it comes to relative paths. Will have a look.

xrotwang commented 2 years ago

Another thing: csvw2json - when run on a CSV file rather than on a metadata file - will only work for the default CSV dialect. But csvwdescribe will infer the right dialect in your case (from the *.tsv suffix). So you can do

$ csvwdescribe etc/orthography/Atalla.tsv > md.json
$ csvw2json md.json

LinguList commented 2 years ago

Ah, nice!

cldf / csvw

CSVW.to_json combines paths for filenames when opening paths inside a folder #61