datahq / dataflows

DataFlows is a simple, intuitive lightweight framework for building data processing flows in python.
https://dataflows.org
MIT License
194 stars 39 forks source link

Exception occurred while running dataflows in Windows OS #57

Open svetozarstojkovic opened 5 years ago

svetozarstojkovic commented 5 years ago

As a developer who is using Windows operating system, I want to use dataflows for data wrangling so that I can datapackage data the easier way.

Example error when running https://gitlab.com/datopian/datasets-fedex/blob/master/flows/country_currencies.py :

screenshot 6

svetozarstojkovic commented 5 years ago

Other problem occurred when using dump_to_path, it gives blank lines between each row. Possible solution: https://stackoverflow.com/questions/3348460/csv-file-written-with-python-has-blank-lines-between-each-row

svetozarstojkovic commented 5 years ago

Blank lines between the rows have been fixed by doing next steps:

This fix should be checked Linux and Mac how it behaves.

akariv commented 5 years ago
  1. @svetozarstojkovic please try to use one issue per problem 😄
  2. Pasting tracebacks as images is not very helpful. Restrict use of screenshots to visual glitches, not for textual data

As for the bug - can you add a print statement before that line to see what is the filename that it's attempting to open (i.e. print(self.tmpfile.name)?

nirabpudasaini commented 4 years ago

Running into the same issue in windows. Can run the same code in Ububtu 18.04 on WSL. Here is the trace for the break.

Traceback (most recent call last):
  File ".\flows\subdivision_endonyms.py", line 85, in <module>
    subdivision_endonyms_cldr.process()
  File "C:\Users\lenovo\AppData\Local\Programs\Python\Python37-32\lib\site-packages\dataflows\base\flow.py", line 15, in process
    return self._chain().process()
  File "C:\Users\lenovo\AppData\Local\Programs\Python\Python37-32\lib\site-packages\dataflows\base\datastream_processor.py", line 86, in process
    collections.deque(res, maxlen=0)
  File "C:\Users\lenovo\AppData\Local\Programs\Python\Python37-32\lib\site-packages\dataflows\processors\dumpers\dumper_base.py", line 69, in row_counter 
    for row in iterator:
  File "C:\Users\lenovo\AppData\Local\Programs\Python\Python37-32\lib\site-packages\dataflows\processors\dumpers\file_dumper.py", line 76, in rows_processor
    for row in resource:
  File "C:\Users\lenovo\AppData\Local\Programs\Python\Python37-32\lib\site-packages\dataflows\base\schema_validator.py", line 46, in schema_validator     
    for i, row in enumerate(iterator):
  File "C:\Users\lenovo\AppData\Local\Programs\Python\Python37-32\lib\site-packages\dataflows\helpers\rows_processor.py", line 11, in process_resource    
    yield from self.func(resource)
  File "C:\Users\lenovo\AppData\Local\Programs\Python\Python37-32\lib\site-packages\dataflows\processors\printer.py", line 61, in func
    for i, row in enumerate(rows):
  File "C:\Users\lenovo\AppData\Local\Programs\Python\Python37-32\lib\site-packages\dataflows\processors\validate.py", line 55, in process_resource       
    yield from self.validator(res)
  File "C:\Users\lenovo\AppData\Local\Programs\Python\Python37-32\lib\site-packages\dataflows\processors\validate.py", line 50, in func
    yield from schema_validator(res.res, res, on_error=self.on_error)
  File "C:\Users\lenovo\AppData\Local\Programs\Python\Python37-32\lib\site-packages\dataflows\base\schema_validator.py", line 46, in schema_validator     
    for i, row in enumerate(iterator):
  File "C:\Users\lenovo\AppData\Local\Programs\Python\Python37-32\lib\site-packages\dataflows\processors\sort_rows.py", line 32, in _sorter
    db = KVFile()
  File "C:\Users\lenovo\AppData\Local\Programs\Python\Python37-32\lib\site-packages\kvfile\kvfile.py", line 19, in __init__
    self.db = DB_ENGINE.connect(self.tmpfile.name)
sqlite3.OperationalError: unable to open database file

@akariv Printed out the tmpfile and got C:\Users\lenovo\AppData\Local\Temp\tmpnz1gwmsa

nirabpudasaini commented 4 years ago

Temporary fix by @paulmz1 in https://github.com/datasets/covid-19/issues/77 was to get it running off the memory instead of the temporary file by editing the kvfile.py. Seems like an issue with kvfiles.

akariv commented 4 years ago

@nirabpudasaini - Thanks for the info! Just to verify - do you have write access to that directory? Does it exist?

akariv commented 4 years ago

@nirabpudasaini - please pip install kvfile==0.0.8 and let me know if that works