Closed PaulM5406 closed 3 years ago
We are taking a look at this, thanks for the bug report.
Can we get a repro of this, like the contents of temp.csv
?
I am having the same issue on Windows. The issue keeps stating it is coming from commit_date
so I tried to specify it manually and it still errors out. Dolt version is 0.26.6, doltpy is 2.0.6, Python is 3.7.9.
main.py
from doltpy.cli import Dolt
from doltpy.cli.write import write_file
from datetime import datetime
dolt = Dolt.init('./tennis-players')
write_file(dolt,
'great_players',
open('./great_players.csv'),
import_mode='force_create',
primary_key=['id'],
commit=True,
commit_message='test commit',
commit_date=datetime.now())
great_players.csv:
id, name
1, Eric
2, Bob
3, Frank
Trying to run the program:
C:\Users\Eric\Documents\doltpy_write_test> python main.py b'An error occurred moving data:\n\ncause: error garbage collecting the following files:\t.\\.dolt\\noms/4bujnjuatdli0klda7ucot0tl836jujt: remove .\\.dolt\\noms/4bujnjuatdli0klda7ucot0tl836jujt: The process cannot access the file because it is being used by another process.\t.\\.dolt\\noms/b4jhr1nlmr9fin1a6fse4fqfilk2o1pd: remove .\\.dolt\\noms/b4jhr1nlmr9fin1a6fse4fqfilk2o1pd: The process cannot access the file because it is being used by another process.\t.\\.dolt\\noms/dr6ha44vervspsgb5psvtgprccinsca5: remove .\\.dolt\\noms/dr6ha44vervspsgb5psvtgprccinsca5: The process cannot access the file because it is being used by another process.\t.\\.dolt\\noms/t49qbt77gnvhn6mt2fvh1vgbmdv9h5t0: remove .\\.dolt\\noms/t49qbt77gnvhn6mt2fvh1vgbmdv9h5t0: The process cannot access the file because it is being used by another process.\n'
Traceback (most recent call last):
File "main.py", line 13, in <module>
commit_date=datetime.now())
File "C:\Users\Eric\AppData\Local\Programs\Python\Python37\lib\site-packages\doltcli\utils.py", line 81, in write_file
commit_date=commit_date,
File "C:\Users\Eric\AppData\Local\Programs\Python\Python37\lib\site-packages\doltcli\utils.py", line 199, in _import_helper
dolt.execute(args + [fname])
File "C:\Users\Eric\AppData\Local\Programs\Python\Python37\lib\site-packages\doltcli\dolt.py", line 309, in execute
output = _execute(args, self.repo_dir, outfile=outfile)
File "C:\Users\Eric\AppData\Local\Programs\Python\Python37\lib\site-packages\doltcli\dolt.py", line 89, in _execute
raise DoltException(str_args, out, err, exitcode)
doltcli.dolt.DoltException: ('table import great_players -c --pk=id C:\\Users\\Eric\\AppData\\Local\\Temp\\tmp37g_arpf.csv', b'Rows Processed: 3, Additions: 3, Modifications: 0, Had No Effect: 0', b'An error occurred moving data:\n\ncause: error garbage collecting the following files:\t.\\.dolt\\noms/4bujnjuatdli0klda7ucot0tl836jujt: remove .\\.dolt\\noms/4bujnjuatdli0klda7ucot0tl836jujt: The process cannot access the file because it is being used by another process.\t.\\.dolt\\noms/b4jhr1nlmr9fin1a6fse4fqfilk2o1pd: remove .\\.dolt\\noms/b4jhr1nlmr9fin1a6fse4fqfilk2o1pd: The process cannot access the file because it is being used by another process.\t.\\.dolt\\noms/dr6ha44vervspsgb5psvtgprccinsca5: remove .\\.dolt\\noms/dr6ha44vervspsgb5psvtgprccinsca5: The process cannot access the file because it is being used by another process.\t.\\.dolt\\noms/t49qbt77gnvhn6mt2fvh1vgbmdv9h5t0: remove .\\.dolt\\noms/t49qbt77gnvhn6mt2fvh1vgbmdv9h5t0: The process cannot access the file because it is being used by another process.\n', 1)
The reproduction code that @EricTurner3 just posted about is from https://docs.dolthub.com/interfaces/python
Same issue with read_pandas:
df = read_pandas(dolt, 'IPv4ToCountry')
---------------------------------------------------------------------------
PermissionError Traceback (most recent call last)
<ipython-input-5-4a020434a859> in <module>()
1 from doltpy.cli.read import read_pandas
----> 2 df = read_pandas(dolt, 'IPv4ToCountry')
c:\users\steve\appdata\local\programs\python\python36\lib\site-packages\doltpy\cli\read.py in read_pandas(dolt, table, as_of)
26
27 def read_pandas(dolt: Dolt, table: str, as_of: str = None) -> pd.DataFrame:
---> 28 return read_pandas_sql(dolt, get_read_table_asof_query(table, as_of))
c:\users\steve\appdata\local\programs\python\python36\lib\site-packages\doltpy\cli\read.py in read_pandas_sql(dolt, sql)
22
23 def read_pandas_sql(dolt: Dolt, sql: str) -> pd.DataFrame:
---> 24 return read_table_sql(dolt, sql, result_parser=parse_to_pandas)
25
26
c:\users\steve\appdata\local\programs\python\python36\lib\site-packages\doltcli\utils.py in read_table_sql(dolt, sql, result_parser)
51 dolt: DoltT, sql: str, result_parser: Optional[Callable[[str], Any]] = None
52 ) -> List[dict]:
---> 53 return dolt.sql(sql, result_format="csv", result_parser=result_parser)
54
55
c:\users\steve\appdata\local\programs\python\python36\lib\site-packages\doltcli\dolt.py in sql(self, query, result_format, execute, save, message, list_saved, batch, multi_db_dir, result_file, result_parser, **kwargs)
583 with tempfile.NamedTemporaryFile() as f:
584 args.extend(["--result-format", "csv"])
--> 585 output_file = self.execute(args, stdout_to_file=f.name, **kwargs)
586 if not hasattr(result_parser, "__call__"):
587 raise ValueError(
c:\users\steve\appdata\local\programs\python\python36\lib\site-packages\doltcli\dolt.py in execute(self, args, print_output, stdout_to_file, error)
330 output = repr(e)
331 else:
--> 332 output = _execute(args, self.repo_dir, outfile=stdout_to_file)
333
334 print_output = print_output or self._print_output
c:\users\steve\appdata\local\programs\python\python36\lib\site-packages\doltcli\dolt.py in _execute(args, cwd, outfile)
79 logger.info(str_args)
80 if outfile:
---> 81 with open(outfile, "w") as f:
82 # _outfile = open(outfile, "w") if outfile else PIPE
83 proc = Popen(args=_args, cwd=cwd, stdout=f, stderr=PIPE)
PermissionError: [Errno 13] Permission denied: 'C:\\Users\\Steve\\AppData\\Local\\Temp\\tmp7x1q3v_w'
This appears to be the case for Windows 10 Home Edition x64 English - May 2021 Update with v0.26.6 msi and Python 3.9.5 installer x64 bit.
I'm sure other versions have the same issues too, but just incase, I put the information for a specific version of software that is guaranteed to make this bug reproducible.
I should mention that the script does in fact insert the data and recreate the table (even in update mode) even despite the garbage collection error messages. Running dolt gc
from command prompt directly does not output any text at all (so I presume it works correctly).
Edit: For the sake of making debugging easier, and now that I know that the bug is reproducible with Windows 10 in a VM, I'm going to install Virtual Box Guest Additions and some tools to help me monitor what program is accessing what files (and potentially go through and set breakpoints on doltpy to see when the error message is activated).
Not entirely sure if the buffer overflow is related to this particular bug, but it is triggered by write_file. The Python process was captured under Python.PML and the write_file action (the CLI side) was captured under Dolt_writefile.PML in the zip Debug.zip.
I used Microsoft's official process monitor to capture the data. https://docs.microsoft.com/en-us/sysinternals/downloads/procmon
As I didn't think to actually save the error message for the last zip, this is a "patch" zip that contains a new capture and a text document of the error message output by the script from the original zip. I did not include the script, repo, csv, or anything like that in this second zip.
I also suspect that the buffer overflow may actually be related to the error message as one of the noms affected (and the only one I checked so far) is affected by the buffer overflow.
Edit 2: These are all of the operations done to nom file midf4u9uho210ll0rft1rgmfirnc4lkl
which is located at C:\Users\Alexis\Desktop\great-players-example\.dolt\noms\midf4u9uho210ll0rft1rgmfirnc4lkl
and has 382 bytes.
Edit 3: It may be the sharing access violation that triggers the bug itself as seen by the last screenshot (I believe it to be a symptom of the problem as why would specific noms have a sharing access violation?). I'm currently archiving my whole VM in a highly compressed 7zip archive for storage, so after that finishes, I'll be able to resume the VM and see what the full error message says.
Edit 4:
This may be the cause why this bug exists. https://groups.google.com/g/golang-nuts/c/8BiP_mPoCd4
The new screenshots are not as useful, but they should help show that GoLang is trying to request delete file and read attributes access when it gets a sharing violation exception. The link above shows a simple GoLang example that would cause the sharing violation to occur on Windows, but not cause problems on any other system.
I decided to compile dolt cli using the Mac OSX version of Go Lang from https://golang.org/dl/ (which was named go1.16.4.darwin-amd64.pkg
and had version go version go1.16.4 darwin/amd64
). I also used this commit of Dolt CLI (the one for the latest release of v0.26.6 https://github.com/dolthub/dolt/commit/1eb286f34abc0f861aa855270a2aac4645f2db23) and I then I ran the command GOOS="windows" go build ./cmd/dolt
in order to build the Windows executable on my Mac. It worked flawlessly on my Windows 10 installation whereas the prebuilt version produced the bug that is the topic of this issue.
I believe you may need to upgrade to the latest version of Go in order to compile the Windows version of Dolt CLI as the first bug report I saw related to the sharing violation exception was posted in December of 2020 (so relatively recently). So, the fix would be in a very, very new version of GoLang and therefore is not likely to exist in the CI build unless the CI build pulls the absolutely latest copy of GoLang before each build.
Edit: I did more testing to put my mind at rest to make sure this wasn't a fluke. It was. I captured it in process monitor again and the sharing violations are still there. So, to fix this issue, you'll most likely have to implement a Windows specific method of opening the temporary files so you can tell Windows that these files are deletable in order to fix the sharing violation exception. ☹️
We've just released a new version of Dolt that we think may fix this issue. It's a bit hard to test locally due to its non-deterministic nature. Give it a spin and let us know if you still run into the problem!
Going to close this out for the time being, because we have a solution for windows that we think is as good as we can do for now.
Basically: os.Rename() on Windows is error prone due to the OS preventing access to files sometimes, for reasons not even the golang authors understand. The best we can do for now is retry failed Rename operations, which we are now doing.
If anyone can demonstrate a reliable repro for this, or runs into this again in the wild, please do let us know.
Hello,
write_file and write_pandas raise DoltException on Windows. Here is the traceback. I do not know which is this other process that locks the file. If I run the command
table import data_pisigma -c --pk=id temp.csv
directly from the terminal, I get no error.b'An error occurred moving data:\n\ncause: error garbage collecting the following files:\t.\\.dolt\\noms/4d3m60adthij431437g9v7l8uf4kigpg: remove .\\.dolt\\noms/4d3m60adthij431437g9v7l8uf4kigpg: The process cannot access the file because it is being used by another process.\n' Traceback (most recent call last): File "C:\Users\paolo\Dev\data_pisigma\main.py", line 67, in <module> write_file(dolt, File "C:\ProgramData\Anaconda3\envs\data_pisigma\lib\site-packages\doltcli\utils.py", line 73, in write_file _import_helper( File "C:\ProgramData\Anaconda3\envs\data_pisigma\lib\site-packages\doltcli\utils.py", line 199, in _import_helper dolt.execute(args + [fname]) File "C:\ProgramData\Anaconda3\envs\data_pisigma\lib\site-packages\doltcli\dolt.py", line 309, in execute output = _execute(args, self.repo_dir, outfile=outfile) File "C:\ProgramData\Anaconda3\envs\data_pisigma\lib\site-packages\doltcli\dolt.py", line 89, in _execute raise DoltException(str_args, out, err, exitcode) doltcli.dolt.DoltException: ('table import data_pisigma -c --pk=id C:\\Users\\paolo\\AppData\\Local\\Temp\\tmpqcm7hpaz.csv', b'Rows Processed: 55, Additions: 55, Modifications: 0, Had No Effect: 0', b'An error occurred moving data:\n\ncause: error garbage collecting the following files:\t.\\.dolt\\noms/4d3m60adth ij431437g9v7l8uf4kigpg: remove .\\.dolt\\noms/4d3m60adthij431437g9v7l8uf4kigpg: The process cannot access the file because it is being used by another process.\n', 1)