Open tlukkezen opened 1 year ago
import pygef
pygef.read_cpt("./KNM_GEF_stuk/S0270_35.gef")
> Traceback (most recent call last):
> File "/home/robin/Documents/Repositories/pygef/venv/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3460, in run_code
> exec(code_obj, self.user_global_ns, self.user_ns)
> File "<ipython-input-3-4d7bdd2c1549>", line 3, in <module>
> pygef.read_cpt("./KNM_GEF_stuk/S0270_35.gef")
> File "/home/robin/Documents/Repositories/pygef/venv/lib/python3.9/site-packages/pygef/shim.py", line 82, in read_cpt
> return gef_cpt_to_cpt_data(_GefCpt(path=file))
> File "/home/robin/Documents/Repositories/pygef/venv/lib/python3.9/site-packages/pygef/gef/parse_cpt.py", line 134, in __init__
> self.parse_data(
> File "/home/robin/Documents/Repositories/pygef/venv/lib/python3.9/site-packages/pygef/gef/gef.py", line 151, in parse_data
> return pl.read_csv(
> File "/home/robin/Documents/Repositories/pygef/venv/lib/python3.9/site-packages/polars/io/csv/functions.py", line 354, in read_csv
> df = pl.DataFrame._read_csv(
> File "/home/robin/Documents/Repositories/pygef/venv/lib/python3.9/site-packages/polars/dataframe/frame.py", line 784, in _read_csv
> self._df = PyDataFrame.read_csv(
> exceptions.ComputeError: projection index 1 is out of bounds for CSV schema with 1 columns
The #COLUMNSEPARATOR
argument is not set in the GEF file. Therefore pyGEF
assumes a space is used. Base on the GEF file a tab is used as separator.
import pygef
# Read in the file
with open('./KNM_GEF_stuk/S0270_35.gef', 'r') as file :
filedata = file.read()
# Replace the target string
filedata = filedata.replace('\t', ' ')
# Write the file out again
with open('./KNM_GEF_stuk/S0270_35.gef', 'w') as file:
file.write(filedata)
pygef.read_cpt("./KNM_GEF_stuk/S0270_35.gef")
Its nicer to provide a parsing error and not a polars error.
Yes, throwing a custom error would definitely be preferred, e.g. pygef.exceptions.ParseCptGefError
We could throw it if the inferred column-separator can not be found on every row in the CSV-data for the expected amount of times (= #columns - 1).
Linked to #367
These GEF files could not be parsed using the
read_cpt()
function. It's unclear why or what went wrong, so this ticket requires some investigation.KNM_GEF_stuk.zip