LBNL-ETA / Adapter

The Adapter software is developed at the Energy Efficiency Standards Department and it provides a convenient data table loader from various formats such as xlsx, csv, db (sqlite database), and sqlalchemy. Its main feature is the ability to convert data tables identified in one main and optionally one or more additional input files into database tables and Pandas DataFrames for downstream usage in any compatible software.
1 stars 1 forks source link

Issue 9 db read all tables #11

Closed 0xd5dc closed 2 years ago

0xd5dc commented 2 years ago

I fixed the Db for issue 9. Db's load() reads all tables accordingly from a sqlite file if pre_existing_keys or table_names given.

ps, Db can do Db_sqlalchemy's job as well

milicag commented 2 years ago

Thank you for starting to address this issue. This is what I get on this branch, followed by a pass on master:

ERROR: test_load_pre_keys (adapter.tests.test_to_python.TestDb)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/mgrahovac/repos/public/Adapter/adapter/tests/test_to_python.py", line 99, in test_load_pre_keys
    self.assertEqual(len(self.db.load()), 2)
  File "/Users/mgrahovac/repos/public/Adapter/adapter/to_python.py", line 247, in load
    raise ImportError(f'Cannot find {self.file_path}')
ImportError: Cannot find ./test.db

----------------------------------------------------------------------
Ran 7 tests in 0.602s

FAILED (errors=4)
(atcd) mgrahovac@milicag Adapter % git checkout master
Switched to branch 'master'
Your branch is up to date with 'origin/master'.
(atcd) mgrahovac@milicag Adapter % python -m unittest adapter.tests.test_to_python
..INFO:adapter.to_python:Connected to: adapter/tests/test.xlsx
Read in <Table 'xlsx_table1': test!$J$2:$L$5>
Read in <Table 'xlsx_table2': test!$J$9:$L$12>
Read in <Table 'run_parameters': test!$B$2:$C$3>
Read in <Table 'xlsx_single_col_table': test!$L$21:$L$25>
Read in <Name 'xlsx_named_range1': =test!$J$16:$K$18>
Read in <Name 'xlsx_single_col_range': =test!$J$21:$J$25>
Read in <Name 'xlsx_single_value_named_range': =test!$J$29>
INFO:adapter.to_python:Read in input tables and ranges from adapter/tests/test.xlsx.
Read in <Table 'xlsx_table1': test!$J$2:$L$5>
Read in <Table 'xlsx_table2': test!$J$9:$L$12>
INFO:adapter.to_python:Read in input tables and ranges from adapter/tests/test.xlsx.
Read in <Name 'xlsx_named_range1': =test!$J$16:$K$18>
Read in <Name 'xlsx_single_col_range': =test!$J$21:$J$25>
INFO:adapter.to_python:Read in input tables and ranges from adapter/tests/test.xlsx.
.
----------------------------------------------------------------------
Ran 3 tests in 0.656s

OK
milicag commented 2 years ago

@0xd5dc Also please make sure to test it with ATCD base tools on our end.

evaneill commented 2 years ago

I'm also getting errors about the missing test .db files when running the test_to_python. If they're really small I'd just commit them. Otherwise instantiating them could be added to the setup.

But this is working for me running NIA & shipments unit tests, so I think it's working! I also ran some timing tests on a small ~30MB local database, querying a ~1MB table, and this was 30x faster than master (makes sense given the size). Exciting improvement!

A few smaller things:

  1. you can increment the patch version in adapter/init before you eventually merge & tag
  2. The IO error about (# tables in db == 0) can include the self.file_path for easier debugging
  3. When a value in self.pre_existing_keys is in table_names, I think it should still throw an error (the implication being that adapter has already loaded a table with this name before, and there's no good way to know which it should prefer). Can use Debugger to check & throw the error

Thanks for doing this!

0xd5dc commented 2 years ago

@milicag I think you tested it using the master branch. I did test it with one of the ATCD base tools successfully.

pip uninstall adapterio 
pip install git+https://github.com/LBNL-ETA/Adapter.git@issue_9_db_read_all_tables

@evaneill You're welcome! I will update the version, will include the file path in the error message. What's the purpose of Debugger?

milicag commented 2 years ago

@0xd5dc no, I tested your branch and the master as provided in my comment that I 🤔 you are referring to.

evaneill commented 2 years ago

The tests still work for me with our tools with this adapter.

I also got these tests to pass, but ONLY if I run them from the test directory itself. Ideally want to be able to run them from root. Can probably just define a constant and use it in the code.

TEST_DIR = os.path.dirname(__file__) and then wherever the test files are read, do an os.path.join(TEST_DIR,...)

doing the os.path.join will also help since conventions like './test.db' won't work on windows

evaneill commented 2 years ago

The tests still work for me with our tools with this adapter.

I also got these tests to pass, but ONLY if I run them from the test directory itself. Ideally want to be able to run them from root. Can probably just define a constant at the top of test files (e.g. test_to_python.py) and use it in the code:

TEST_DIR = os.path.dirname(__file__) and then wherever the test files are read, do an os.path.join(TEST_DIR,...)

doing the os.path.join will also help since conventions like './test.db' won't work on windows