Closed woody544 closed 2 years ago
Would you mind posting exactly what commands you ran at the prompt to produce the errors? Ideally both the init
and serve
commands. I’m assuming these are just the .tsv files from the samples
directory.
FWIW, I don’t have ready access to a Windows machine but hopefully we can work this out together.
There might be a clue here. Looking forward to hearing back on what you typed.
Would you mind posting exactly what commands you ran at the prompt to produce the errors? Ideally both the
init
andserve
commands. I’m assuming these are just the .tsv files from thesamples
directory.
Yes, I am using the .tsv files from the samples directory.
In following the steps outlined, I have the error after the first init step:
(venv) C:\Users\jennifer.woodward\OneDrive - USDA\myGitOneDrive\NALT4MA\csv-reconcile>csv-reconcile init sample/reps.tsv item itemLabel
Traceback (most recent call last):
File "C:\Program Files\Python39\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Program Files\Python39\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Users\jennifer.woodward\OneDrive - USDA\myGitOneDrive\NALT4MA\csv-reconcile\venv\Scripts\csv-reconcile.exe\__main__.py", line 7, in <module>
File "c:\users\jennifer.woodward\onedrive - usda\mygitonedrive\nalt4ma\csv-reconcile\venv\lib\site-packages\csv_reconcile\__init__.py", line 321, in main
return cli()
File "c:\users\jennifer.woodward\onedrive - usda\mygitonedrive\nalt4ma\csv-reconcile\venv\lib\site-packages\click\core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "c:\users\jennifer.woodward\onedrive - usda\mygitonedrive\nalt4ma\csv-reconcile\venv\lib\site-packages\click\core.py", line 1055, in main
rv = self.invoke(ctx)
File "c:\users\jennifer.woodward\onedrive - usda\mygitonedrive\nalt4ma\csv-reconcile\venv\lib\site-packages\click\core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "c:\users\jennifer.woodward\onedrive - usda\mygitonedrive\nalt4ma\csv-reconcile\venv\lib\site-packages\click\core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "c:\users\jennifer.woodward\onedrive - usda\mygitonedrive\nalt4ma\csv-reconcile\venv\lib\site-packages\click\core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "c:\users\jennifer.woodward\onedrive - usda\mygitonedrive\nalt4ma\csv-reconcile\venv\lib\site-packages\csv_reconcile\__init__.py", line 271, in init
return doinit(config, scorerOption, csvfile, idcol, namecol)
File "c:\users\jennifer.woodward\onedrive - usda\mygitonedrive\nalt4ma\csv-reconcile\venv\lib\site-packages\csv_reconcile\__init__.py", line 259, in doinit
initdb.init_db_with_context(csvfile, idcol, namecol)
File "c:\users\jennifer.woodward\onedrive - usda\mygitonedrive\nalt4ma\csv-reconcile\venv\lib\site-packages\csv_reconcile\initdb.py", line 95, in init_db_with_context
return init_db(db,
File "c:\users\jennifer.woodward\onedrive - usda\mygitonedrive\nalt4ma\csv-reconcile\venv\lib\site-packages\csv_reconcile\initdb.py", line 68, in init_db
ididx = header.index(idcol)
ValueError: 'item' is not in list
(venv) C:\Users\jennifer.woodward\OneDrive - USDA\myGitOneDrive\NALT4MA\csv-reconcile>csv-reconcile serve
* Serving Flask app 'csv-reconcile' (lazy loading)
* Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead.
* Debug mode: off
* Running on http://127.0.0.1:5000 (Press CTRL+C to quit)
127.0.0.1 - - [22/Apr/2022 23:54:56] "GET /reconcile HTTP/1.1" 200 -
The browser appears as:
Okay, the serve
step can’t work until you get the init
step to pass so let’s focus on that.
I saw your comment on the other issue. Did you actually try using the cp1250
encoding described there? It might be worth doing that after deleting everything and starting from scratch.
The only thing that’s clear is that the csv file is actually being read but it’s not seeing the columns as columns. It’s not a bad guess that the issue might be handling the encoding.
If this doesn’t fix it I may need to ask you to try from a custom branch where I add more debugging info.
Did you actually try using the
cp1250
encoding described there? It might be worth doing that after deleting everything and starting from scratch.
Yes, I did not make any change to the progressives.tsv file, and it returns essentially the same error. I am no longer getting an encoding error.
I am having trouble tracing where idcol is first identified.
Just to be super clear, in the other issue you mentioned changing the encoding of the file and not using the configuration suggested there. Are you saying you deleted everything and then followed the instructions from that issue?
Also, you don’t need progressives.tsv
to run the init
command and we’re currently focused on the init
(i.e. first) command. I.e. the following from your previous post:
csv-reconcile init sample/reps.tsv item itemLabel
The args here are the csv file being reconciled against (i.e. sample/reps.tsv
), the idcol item
and the “name” column used to do the actual reconciliation. These two columns are expected to be found on the first line of the csv file.
If the encoding is wrong or you’re using the wrong separator then the parser might not recognize that these are separate columns. The default separator should work.
If you’re asking how these args get passed through to the code that’s failing I can walk you through it but the stack trace should tell you the files and lines you should look at.
FWIW, I plan to issue a release soon with a change mentioned in the other issue which should more seamlessly handle various csv file encodings.
I’ll leave this issue open another week but if I don’t hear back, I’ll assume your issue has been resolved.
[EDIT] The release is now complete. You might want to start from scratch to see if this simply takes care of your issue.
@woody544 Just checking in if everything’s working for you. If I don’t hear back, I’ll close this out next week.
I had the same issue today, and eventually managed to figure out that for me the problem was that csv-reconcile thought all my column names were in fact one big column, e.g. ["Column1\tColumn2\tColumn3\tColumn4"]
. Once I knew the issue it was easy enough for me to set up the config to tell it to split on \t
.
I figured this out by adding a print statement in intidb.py just before line 87, stating what columns I had available to choose from. This might help others in the future if they're struggling to figure it out - would it be useful if I made a PR that did this? I don't know enough python to know if it's an appropriately pythonic way to behave ;)
Instead of a print statement maybe it would be better to have clearer error messages. Would you be able to give me exact steps to reproduce include the csv to use?
I was able to set up and run csv-reconcile serve, but cannot run the example on the reps.tsv file I get
ValueError: 'item' is not in the list
, similarly when I try the progressives.tsv file I getValueError: 'itemLabel' is not in list
. The errors are otherwise identical, except the last few lines. I have tried restarting everything, and cannot get the init step to work before running the serve command. Any suggestions would be appreciated.Last few lines of the error for reps.tsv:
Last few lines of the error for progressives.tsv:
The full error for reps.tsv: