amundsen-io / amundsen

Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
https://www.amundsen.io/amundsen/
Apache License 2.0
4.44k stars 961 forks source link

Error when loading sample data as in installation guide #811

Closed MattX closed 3 years ago

MattX commented 4 years ago

I'm following the sample installation guide. The last step under loading the data is failing for me.

Expected Behavior

The example/scripts/sample_data_loader.py should complete successfully.

Current Behavior

I get an error running the script:

/Users/username/src/amundsen/amundsendatabuilder/venv/lib/python3.9/site-packages/requests/__init__.py:89: RequestsDependencyWarning: urllib3 (1.26.1) or chardet (3.0.4) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
Traceback (most recent call last):
  File "/Users/username/src/amundsen/amundsendatabuilder/example/scripts/sample_data_loader.py", line 284, in <module>
    run_csv_job('example/sample_data/sample_table_column_stats.csv', 'test_table_column_stats',
  File "/Users/username/src/amundsen/amundsendatabuilder/example/scripts/sample_data_loader.py", line 113, in run_csv_job
    DefaultJob(conf=job_config,
  File "/Users/username/src/amundsen/amundsendatabuilder/venv/lib/python3.9/site-packages/amundsen_databuilder-4.0.3-py3.9.egg/databuilder/job/job.py", line 77, in launch
  File "/Users/username/src/amundsen/amundsendatabuilder/venv/lib/python3.9/site-packages/amundsen_databuilder-4.0.3-py3.9.egg/databuilder/job/job.py", line 67, in launch
  File "/Users/username/src/amundsen/amundsendatabuilder/venv/lib/python3.9/site-packages/amundsen_databuilder-4.0.3-py3.9.egg/databuilder/task/task.py", line 65, in run
  File "/Users/username/src/amundsen/amundsendatabuilder/venv/lib/python3.9/site-packages/amundsen_databuilder-4.0.3-py3.9.egg/databuilder/loader/file_system_neo4j_csv_loader.py", line 119, in load
  File "/usr/local/Cellar/python@3.9/3.9.0_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/csv.py", line 154, in writerow
    return self.writer.writerow(self._dict_to_list(rowdict))
  File "/usr/local/Cellar/python@3.9/3.9.0_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/csv.py", line 149, in _dict_to_list
    raise ValueError("dict contains fields not in fieldnames: "
ValueError: dict contains fields not in fieldnames: 'stat_val'

Possible Solution

Steps to Reproduce

Follow the steps in the installation guide. The error occurs at the last command in section 4: python3 example/scripts/sample_data_loader.py

Screenshots (if appropriate)

N/A

Context

The docker container has started with no apparent errors, and the Python environment setup for the loading script went as expected as well.

Your Environment

I'm trying to run a simple Amundsen install with sample data for evaluation. I'm running Amundsen commit a09cb61 with Neo4J on Docker for Mac Docker version 19.03.12, and the script with Python 3.9.

DataBrenes commented 4 years ago

Same thing. I thought this was resolved on https://github.com/amundsen-io/amundsen/issues/808

heathgtuck commented 4 years ago

Having the same "stat_val" issue currently

feng-tao commented 4 years ago

please rebase with databuilder master and retry. We just fixed the issue this week.

DataBrenes commented 4 years ago

@feng-tao is there a specific location I should be re-basing to?

feng-tao commented 4 years ago

master branch of https://github.com/amundsen-io/amundsendatabuilder

feng-tao commented 4 years ago

I update the amundsen submodule version. Please pull again to latest version. Thanks.

DataBrenes commented 4 years ago

@feng-tao Sorry . Still not seeing a change. What am I missing?

MINGW64 ~/Desktop/Development/amundsen/amundsendatabuilder (master) $ git rebase Current branch master is up to date.

Traceback (most recent call last):
  File "example/scripts/sample_data_loader.py", line 272, in <module>
    'databuilder.models.table_stats.TableColumnStats')
  File "example/scripts/sample_data_loader.py", line 103, in run_csv_job
    publisher=Neo4jCsvPublisher()).launch()
  File "C:\Users\mabrenes\Miniconda3\envs\amundsenPy3_7\lib\site-packages\amundsen_databuilder-4.0.3-py3.7.egg\databuilder\job\job.py", line 77, in launch        
  File "C:\Users\mabrenes\Miniconda3\envs\amundsenPy3_7\lib\site-packages\amundsen_databuilder-4.0.3-py3.7.egg\databuilder\job\job.py", line 67, in launch        
  File "C:\Users\mabrenes\Miniconda3\envs\amundsenPy3_7\lib\site-packages\amundsen_databuilder-4.0.3-py3.7.egg\databuilder\task\task.py", line 65, in run
  File "C:\Users\mabrenes\Miniconda3\envs\amundsenPy3_7\lib\site-packages\amundsen_databuilder-4.0.3-py3.7.egg\databuilder\loader\file_system_neo4j_csv_loader.py", line 119, in load
  File "C:\Users\mabrenes\Miniconda3\envs\amundsenPy3_7\lib\csv.py", line 155, in writerow
    return self.writer.writerow(self._dict_to_list(rowdict))
  File "C:\Users\mabrenes\Miniconda3\envs\amundsenPy3_7\lib\csv.py", line 151, in _dict_to_list
    + ", ".join([repr(x) for x in wrong_fields]))
ValueError: dict contains fields not in fieldnames: 'stat_val'
feng-tao commented 3 years ago

could you use the master branch instead of the 4.0.3 version?

heathgtuck commented 3 years ago

Awesome! Got it up and running with the sample data, thanks.