Closed lcnittl closed 3 years ago
Hi @lcnittl, thanks for the suggestion, this sounds like an interesting application of CleverCSV!
I'd be happy to add a pre-commit hook to the repo. If I understand your suggestion correctly, it would be to use the clevercsv standardize
functionality to ensure CSV files are correctly formatted, right?
Cool, alright that sounds good, I'll start a PR soon with the necessary additions to the standardize command
Nice - Thanks for working on it!
Hi @lcnittl
I've just merged #26, which adds support for standardizing multiple files and adds the --in-place
flag. The return code of clevercsv standardize
is 2
when any of the input files is changed and 0
if none of them are. Hopefully this is sufficient for implementing the pre-commit hook!
Cool, thanks a lot! Shall I make a PR for a .pre-commit-hooks.yaml
file?
Yeah, that would be great, thanks!
I have created a fork and added the .pre-commit-hooks.yaml
file there. However, I am having problems installing CleverCSV with pip and the git+https URI scheme (To my understanding, this is the way pre-commit would install the package too):
(.venv) PS C:\Users\lcnittl\playground> pip install --upgrade git+https://github.com/lcnittl/CleverCSV@patch-1
Collecting git+https://github.com/lcnittl/CleverCSV@patch-1
Cloning https://github.com/lcnittl/CleverCSV (to revision patch-1) to c:\users\lcnittl\appdata\local\temp\pip-req-build-j5a594ud
Collecting chardet>=3.0
Using cached chardet-3.0.4-py2.py3-none-any.whl (133 kB)
Collecting regex>=2018.11
Using cached regex-2020.10.15-cp38-cp38-win_amd64.whl (268 kB)
Building wheels for collected packages: clevercsv
Building wheel for clevercsv (setup.py) ... error
ERROR: Command errored out with exit status 1:
command: 'C:\Users\lcnittl\playground\.venv\Scripts\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\lcnittl\\AppData\\Local\\Temp\\pip-req-build-j5a594ud\\setup.py'"'"'; __file__='"'"'C:\\Users\\lcnittl\\AppData\\Local\\Temp\\pip-req-build-j5a594ud\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d 'C:\Users\lcnittl\AppData\Local\Temp\pip-wheel-p580azgi'
cwd: C:\Users\lcnittl\AppData\Local\Temp\pip-req-build-j5a594ud\
Complete output (70 lines):
running bdist_wheel
running build
running build_py
creating build
creating build\lib.win-amd64-3.8
creating build\lib.win-amd64-3.8\clevercsv
copying clevercsv\break_ties.py -> build\lib.win-amd64-3.8\clevercsv
copying clevercsv\consistency.py -> build\lib.win-amd64-3.8\clevercsv
copying clevercsv\cparser_util.py -> build\lib.win-amd64-3.8\clevercsv
copying clevercsv\detect.py -> build\lib.win-amd64-3.8\clevercsv
copying clevercsv\detect_pattern.py -> build\lib.win-amd64-3.8\clevercsv
copying clevercsv\detect_type.py -> build\lib.win-amd64-3.8\clevercsv
copying clevercsv\dialect.py -> build\lib.win-amd64-3.8\clevercsv
copying clevercsv\dict_read_write.py -> build\lib.win-amd64-3.8\clevercsv
copying clevercsv\escape.py -> build\lib.win-amd64-3.8\clevercsv
copying clevercsv\exceptions.py -> build\lib.win-amd64-3.8\clevercsv
copying clevercsv\normal_form.py -> build\lib.win-amd64-3.8\clevercsv
copying clevercsv\potential_dialects.py -> build\lib.win-amd64-3.8\clevercsv
copying clevercsv\read.py -> build\lib.win-amd64-3.8\clevercsv
copying clevercsv\utils.py -> build\lib.win-amd64-3.8\clevercsv
copying clevercsv\wrappers.py -> build\lib.win-amd64-3.8\clevercsv
copying clevercsv\write.py -> build\lib.win-amd64-3.8\clevercsv
copying clevercsv\_optional.py -> build\lib.win-amd64-3.8\clevercsv
copying clevercsv\__init__.py -> build\lib.win-amd64-3.8\clevercsv
copying clevercsv\__main__.py -> build\lib.win-amd64-3.8\clevercsv
copying clevercsv\__version__.py -> build\lib.win-amd64-3.8\clevercsv
creating build\lib.win-amd64-3.8\clevercsv\console
copying clevercsv\console\application.py -> build\lib.win-amd64-3.8\clevercsv\console
copying clevercsv\console\config.py -> build\lib.win-amd64-3.8\clevercsv\console
copying clevercsv\console\__init__.py -> build\lib.win-amd64-3.8\clevercsv\console
creating build\lib.win-amd64-3.8\clevercsv\console\commands
copying clevercsv\console\commands\code.py -> build\lib.win-amd64-3.8\clevercsv\console\commands
copying clevercsv\console\commands\detect.py -> build\lib.win-amd64-3.8\clevercsv\console\commands
copying clevercsv\console\commands\explore.py -> build\lib.win-amd64-3.8\clevercsv\console\commands
copying clevercsv\console\commands\standardize.py -> build\lib.win-amd64-3.8\clevercsv\console\commands
copying clevercsv\console\commands\view.py -> build\lib.win-amd64-3.8\clevercsv\console\commands
copying clevercsv\console\commands\_utils.py -> build\lib.win-amd64-3.8\clevercsv\console\commands
copying clevercsv\console\commands\__init__.py -> build\lib.win-amd64-3.8\clevercsv\console\commands
running egg_info
creating clevercsv.egg-info
writing clevercsv.egg-info\PKG-INFO
writing dependency_links to clevercsv.egg-info\dependency_links.txt
writing entry points to clevercsv.egg-info\entry_points.txt
writing requirements to clevercsv.egg-info\requires.txt
writing top-level names to clevercsv.egg-info\top_level.txt
writing manifest file 'clevercsv.egg-info\SOURCES.txt'
reading manifest file 'clevercsv.egg-info\SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching 'requirements.txt'
warning: no files found matching '*' under directory 'bin'
no previously-included directories found matching 'tests\test_integration'
warning: no previously-included files found matching 'Makefile'
warning: no previously-included files found matching '.gitignore'
warning: no previously-included files found matching '.travis.yml'
warning: no previously-included files found matching '.readthedocs.yml'
warning: no previously-included files found matching 'make_release.py'
warning: no previously-included files found matching 'cgrep'
warning: no previously-included files found matching 'vgrep'
no previously-included directories found matching 'notes'
no previously-included directories found matching 'auxiliary'
writing manifest file 'clevercsv.egg-info\SOURCES.txt'
running build_ext
building 'clevercsv.cparser' extension
creating build\temp.win-amd64-3.8
creating build\temp.win-amd64-3.8\Release
creating build\temp.win-amd64-3.8\Release\src
C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Users\lcnittl\playground\.venv\include -Ic:\python\python38\include -Ic:\python\python38\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\VC\Tools\MSVC\14.16.27023\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\ucrt" /Tcsrc/cparser.c /Fobuild\temp.win-amd64-3.8\Release\src/cparser.obj
cparser.c
c:\python\python38\include\pyconfig.h(206): fatal error C1083: Cannot open include file: 'basetsd.h': No such file or directory
error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\BuildTools\\VC\\Tools\\MSVC\\14.16.27023\\bin\\HostX86\\x64\\cl.exe' failed with exit status 2
----------------------------------------
ERROR: Failed building wheel for clevercsv
Running setup.py clean for clevercsv
Failed to build clevercsv
Installing collected packages: chardet, regex, clevercsv
Running setup.py install for clevercsv ... error
ERROR: Command errored out with exit status 1:
command: 'C:\Users\lcnittl\playground\.venv\Scripts\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\lcnittl\\AppData\\Local\\Temp\\pip-req-build-j5a594ud\\setup.py'"'"'; __file__='"'"'C:\\Users\\lcnittl\\AppData\\Local\\Temp\\pip-req-build-j5a594ud\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record 'C:\Users\lcnittl\AppData\Local\Temp\pip-record-prlevfmk\install-record.txt' --single-version-externally-managed --compile --install-headers 'C:\Users\lcnittl\playground\.venv\include\site\python3.8\clevercsv'
cwd: C:\Users\lcnittl\AppData\Local\Temp\pip-req-build-j5a594ud\
Complete output (68 lines):
running install
running build
running build_py
creating build
creating build\lib.win-amd64-3.8
creating build\lib.win-amd64-3.8\clevercsv
copying clevercsv\break_ties.py -> build\lib.win-amd64-3.8\clevercsv
copying clevercsv\consistency.py -> build\lib.win-amd64-3.8\clevercsv
copying clevercsv\cparser_util.py -> build\lib.win-amd64-3.8\clevercsv
copying clevercsv\detect.py -> build\lib.win-amd64-3.8\clevercsv
copying clevercsv\detect_pattern.py -> build\lib.win-amd64-3.8\clevercsv
copying clevercsv\detect_type.py -> build\lib.win-amd64-3.8\clevercsv
copying clevercsv\dialect.py -> build\lib.win-amd64-3.8\clevercsv
copying clevercsv\dict_read_write.py -> build\lib.win-amd64-3.8\clevercsv
copying clevercsv\escape.py -> build\lib.win-amd64-3.8\clevercsv
copying clevercsv\exceptions.py -> build\lib.win-amd64-3.8\clevercsv
copying clevercsv\normal_form.py -> build\lib.win-amd64-3.8\clevercsv
copying clevercsv\potential_dialects.py -> build\lib.win-amd64-3.8\clevercsv
copying clevercsv\read.py -> build\lib.win-amd64-3.8\clevercsv
copying clevercsv\utils.py -> build\lib.win-amd64-3.8\clevercsv
copying clevercsv\wrappers.py -> build\lib.win-amd64-3.8\clevercsv
copying clevercsv\write.py -> build\lib.win-amd64-3.8\clevercsv
copying clevercsv\_optional.py -> build\lib.win-amd64-3.8\clevercsv
copying clevercsv\__init__.py -> build\lib.win-amd64-3.8\clevercsv
copying clevercsv\__main__.py -> build\lib.win-amd64-3.8\clevercsv
copying clevercsv\__version__.py -> build\lib.win-amd64-3.8\clevercsv
creating build\lib.win-amd64-3.8\clevercsv\console
copying clevercsv\console\application.py -> build\lib.win-amd64-3.8\clevercsv\console
copying clevercsv\console\config.py -> build\lib.win-amd64-3.8\clevercsv\console
copying clevercsv\console\__init__.py -> build\lib.win-amd64-3.8\clevercsv\console
creating build\lib.win-amd64-3.8\clevercsv\console\commands
copying clevercsv\console\commands\code.py -> build\lib.win-amd64-3.8\clevercsv\console\commands
copying clevercsv\console\commands\detect.py -> build\lib.win-amd64-3.8\clevercsv\console\commands
copying clevercsv\console\commands\explore.py -> build\lib.win-amd64-3.8\clevercsv\console\commands
copying clevercsv\console\commands\standardize.py -> build\lib.win-amd64-3.8\clevercsv\console\commands
copying clevercsv\console\commands\view.py -> build\lib.win-amd64-3.8\clevercsv\console\commands
copying clevercsv\console\commands\_utils.py -> build\lib.win-amd64-3.8\clevercsv\console\commands
copying clevercsv\console\commands\__init__.py -> build\lib.win-amd64-3.8\clevercsv\console\commands
running egg_info
writing clevercsv.egg-info\PKG-INFO
writing dependency_links to clevercsv.egg-info\dependency_links.txt
writing entry points to clevercsv.egg-info\entry_points.txt
writing requirements to clevercsv.egg-info\requires.txt
writing top-level names to clevercsv.egg-info\top_level.txt
reading manifest file 'clevercsv.egg-info\SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching 'requirements.txt'
warning: no files found matching '*' under directory 'bin'
no previously-included directories found matching 'tests\test_integration'
warning: no previously-included files found matching 'Makefile'
warning: no previously-included files found matching '.gitignore'
warning: no previously-included files found matching '.travis.yml'
warning: no previously-included files found matching '.readthedocs.yml'
warning: no previously-included files found matching 'make_release.py'
warning: no previously-included files found matching 'cgrep'
warning: no previously-included files found matching 'vgrep'
no previously-included directories found matching 'notes'
no previously-included directories found matching 'auxiliary'
writing manifest file 'clevercsv.egg-info\SOURCES.txt'
running build_ext
building 'clevercsv.cparser' extension
creating build\temp.win-amd64-3.8
creating build\temp.win-amd64-3.8\Release
creating build\temp.win-amd64-3.8\Release\src
C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Users\lcnittl\playground\.venv\include -Ic:\python\python38\include -Ic:\python\python38\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\VC\Tools\MSVC\14.16.27023\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\ucrt" /Tcsrc/cparser.c /Fobuild\temp.win-amd64-3.8\Release\src/cparser.obj
cparser.c
c:\python\python38\include\pyconfig.h(206): fatal error C1083: Cannot open include file: 'basetsd.h': No such file or directory
error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\BuildTools\\VC\\Tools\\MSVC\\14.16.27023\\bin\\HostX86\\x64\\cl.exe' failed with exit status 2
----------------------------------------
ERROR: Command errored out with exit status 1: 'C:\Users\lcnittl\playground\.venv\Scripts\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\lcnittl\\AppData\\Local\\Temp\\pip-req-build-j5a594ud\\setup.py'"'"'; __file__='"'"'C:\\Users\\lcnittl\\AppData\\Local\\Temp\\pip-req-build-j5a594ud\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record 'C:\Users\lcnittl\AppData\Local\Temp\pip-record-prlevfmk\install-record.txt' --single-version-externally-managed --compile --install-headers 'C:\Users\lcnittl\playground\.venv\include\site\python3.8\clevercsv' Check the logs for full command output.
(.venv) PS C:\Users\lcnittl\playground>
Would you have a (quick) solution for this?
Thanks, Christian
Hi Christian,
From what I can see there's something going wrong in the compilation step, with a missing file called basetsd.h
. A quick search suggests that this may have to do with your visual studio installation.
In general, it wouldn't be great if the pre-commit hook would require compiling CleverCSV from scratch, as this would probably limit the number of users (we distribute wheels for CleverCSV to avoid the need for compilation). However to test your fork it does make sense to install from Github and thus you'd need to compile.
Let me know if you can't get the compilation step to work. An alternative could be that I try to write the pre-commit hook myself and perhaps you could give feedback.
@lcnittl I've just opened a PR with the instructions on adding a pre-commit hook for CleverCSV (#30). Would you mind checking whether this is what you had in mind?
@GjjvdBurg Thanks for the work on it and sorry for my nearly 3 weeks of absence!
Trying to find a workaround for the install issue was on my list but I never managed to get to it. And I did not want to create the PR with that still missing.
Would you mind checking whether this is what you had in mind?
Kind of, but I think there is something mixed up with the hooks and config file (both are needed). I will test if it works with the "local repo" solution you found and adjust my hooks file, and then open a PR.
The pre-commit-config .yaml file of #30 needs a slightly different layout I think, I will comment in your PR as soon as I have it up and working.
Thanks for the quick response. The reason we don't have a .pre-commit-hooks.yaml
file is because we use the repo: local
setting (https://pre-commit.com/#repository-local-hooks). With that approach the hooks file is no longer needed.
Here's how I tested it:
$ mkdir /tmp/testhook
$ cp example/* /tmp/testhook # copy examples from CleverCSV
$ cd /tmp/testhook
$ git init
# add .pre-commit-config.yaml as in the README of #30
$ pre-commit install
pre-commit installed at .git/hooks/pre-commit
$ git add *.csv
$ git commit -am "attempt to commit bad csvs"
CleverCSV Standardize....................................................Failed
- hook id: clevercsv-standardize
- exit code: 2
- files were modified by this hook
Yes, indeed, I realized this now. The "problem" with this approach is, that the basic configuration of the hook is now decentralized - so every user would have to update their config
file if there are changes to the API (any new keyword needed or the like).
PS: Having a potential alternative. Need to test it though
That's a good point. I think we may have the same alternative, how about using the following hook file:
- id: clevercsv-standardize
name: CleverCSV Standardize
entry: clevercsv standardize --in-place
language: python
language_version: python3
types: ['csv']
additional_dependencies: ['clevercsv[full]']
Seems to work with pre-commit try-repo
I got some input from the maintainer of pre-commit (very, very nice guy!!). The problem with having the hook file is that in any case pip install .
would be run. I just tested this on my machine, and it again failed on compilation.
The alternative would be to have a "mirror-repo" that solely consists of a hooks
file and a setup.py
, which install clevercsv
from PyPI. There is a tool that auto-creates the 2 files ( https://github.com/pre-commit/pre-commit-mirror-maker ).
This is what I used:
pre-commit-mirror mirrors-CleverCSV --language python --package-name clevercsv --entry "clevercsv standardize --in-place" --types csv
I just have to upload that to a new repo and test if it really works :D
Some comments from the maintainer, stating why we should not use additional_dependencies
in this very case:
- language_version: python3 ~generally shouldn't be needed, pre-commit only supports python3 now and will default the current interpreter version
- the additional_dependencies should be pinned to a version for repeatability, this is what pre-commit-mirror-maker does when it iterates through versions
- it's ~slightly better to have the pypi dependency in a trivial setup.py (as pre-commit-mirror-maker does) such that the consumer can freely use additional_dependencies to add plugins and such) -- this isn't possible in all languages (for example js) so sometimes additional_dependencies is used by pre-commit-mirror-maker in this way
That's great, thanks for asking him and figuring this out! So if I understand correctly the user would point their .pre-commit-config.yaml
to the mirror repo, which contains a minimal setup.py that depends on CleverCSV?
I'll try the pre-commit-mirror-maker myself as well but let me know what you find.
One caveat is that the versions seem to go all the way back to v0.1.2, which didn't have a command line interface yet. So maybe we don't need to use mirror-maker but can just apply the same idea to a separate repo.
One caveat is that the versions seem to go all the way back to v0.1.2, which didn't have a command line interface yet. So maybe we don't need to use mirror-maker but can just apply the same idea to a separate repo.
I think the idea behind this is, that you can use it to then update the mirror in the future (It will add all new tags).
However, so far it seems to be not possible to use clevercsv[full]
as package name (just clevercsv
) - so manual adjustment would be needed. I will file an issue there.
You can test it with this:
repos:
- repo: https://github.com/lcnittl/mirros-CleverCSV
rev: v0.6.6
hooks:
- id: clevercsv
Cool, I can confirm that your repo works!
I'm trying to think about how to set this up so it isn't a lot of effort to maintain going forward. I think the setup.py of the mirror repo can be simplified to pull the version info from the .version
file (similar to what we do in this repo), and then I can add a bash script or so to update that .version
file and do the git tag whenever I do a new release of CleverCSV.
I'll pick this back up tomorrow, thanks for your help!
If the extras install works in the future, I think the easiest might be to have a batch/shell script that executes the mirroring command, together with git push:
pre-commit-mirror mirrors-CleverCSV --language python --package-name "clevercsv[full]" --entry "clevercsv standardize" --args="--in-place" --types csv
git push
git push --tags
Then updating would be a single "click", and could even run as a cron job/scheduled task or similar. (And would save the hassle to make a separate script - besides the above one 🤣).
PS: The pre-commit maintainer suggests to have --in-place
as argument rather than entry. This is certainly a possibility and would allow users to turn that off by resetting args
, so I changed it above now.
Hi @lcnittl, I've created a dummy package repo here and I've updated the instructions in the readme in #30. I've chosen to go for a bash script to bump the version as it is simpler and therefore hopefully easier to maintain in the long run.
The dummy package has a test suite to verify the hook works, but I'd appreciate it if you could double check :)
The dummy package has a test suite to verify the hook works, but I'd appreciate it if you could double check :)
Can confirm that it works for me without problems.
Thanks a lot for all your effort!
(Not closing as merging of #30 will do so)
PS: I asked to include the CleverCSV hooks in the official pre-commit list of hooks: https://github.com/pre-commit/pre-commit.com/pull/413 (got already merged) to enhance visibility. It is the first CSV hook there :smile:
Ah that's cool, thanks for that! :smile:
Would you consider making CleverCSV a pre-commit hook, by adding a
.pre-commit-hooks.yaml
file to your repository? This would allow people to have csv files in their repositories to automatically check them whenever they want to commit them.The only thing missing to make it work is, as I understand:
--in-place
)clevercsv
to accept a list of file arguments to process.I can file a PR for the
.pre-commit-hooks.yaml
, if you agree (after the above points are implemented).Thanks for your consideration!