dolthub / doltpy

A Python API for Dolt
Apache License 2.0
55 stars 13 forks source link

`write_pandas` silently failing #127

Closed alecstein closed 3 years ago

alecstein commented 3 years ago

Problem

This is happening in the context of the hospitals bounty.

write_pandas does not write to the database, but doesn't return an error message either.

Steps to reproduce

I tried to add the following code data to the database dolt. The data is in the form of a dataframe called

df_cpt_hcpcs_no_dupes

which has had all of the codes removed that already exist in the database. (None of the codes in the DataFrame are present in the dolt database.)

The cell executes without a problem but we can still see that the code does not appear in the database when it is read back.

Screen Shot 2021-02-17 at 5 24 35 PM

Things I have tried

  1. checking the database in the terminal. I can confirm that adding to the hospital table is working, because I am able to see my changes, but adding to the codes table is not. Or at least, it is failing in a way that is hard to detect.
  2. checking the datatypes of the elements being added to the database (all appear to match)
alecstein commented 3 years ago

Writing the data to .csv without an index, then using the CLI to input the data, does get the data successfully into the database. I can then read it from doltpy. However, I'm still encountering problems, will update this thread as I go.

max-hoffman commented 3 years ago

@alecstein how are you creating df_cpt_hcpcs_no_dupes?

I've replicated to here:

In [22]: df_cpt_hcpcs.drop_duplicates().dropna()
Out[22]:
        code                short_description
3      C1769      Hc Guidewire J Stiff 3mm .0
5      C1769      Hc Guidewire Steerable 16 1
6      C1769      Hc Guidewire Asahi Pro .014
7      C1769      Hc Guidewire Asahi Mb .14x3
8      C1769      Hc Guidewire Asahi Mir .014
...      ...                              ...
11082  C1769      Hc Guidewire Wholey 0.35 26
11085  C1769      Hc Guidewire Lunderquist 18
11089  C1769      Hc Guidewire Stabilizer .01
11091  48000  Hc Inpatient Pancreatitis Drain
11092  12001               Hc Suture < 2.5 Cm

[5650 rows x 2 columns]

but "0270" is not in this data. read/write has seemed to work on my end with what I have so far, starting with an empty dolt repo.

alecstein commented 3 years ago

That's odd. If you can't reproduce it I'm not sure what to say except for thanks for trying. Maybe it was how I created the no_dupes DataFrame. I have since deleted that code and started over my db from scratch, so I'm not sure what exactly I did, but things seem to be working now.

max-hoffman commented 3 years ago

Glad it seems to be working now, let me know if you run into any other issues.

oscarbatori commented 3 years ago

@max-hoffman can we close this?

max-hoffman commented 3 years ago

Yeah I think so. @alecstein ping us on discord if you have other doltpy-related issues.