Ghini / ghini.desktop

plant collections manager (desktop version)
http://ghini.github.io/
GNU General Public License v2.0
24 stars 14 forks source link

migrate a sqlite3 database to postgresql #390

Closed mfrasca closed 6 years ago

mfrasca commented 6 years ago

Expected behaviour

I do an export (csv) from my old database, then open the new database and import the data. and it "just works"

Actual behaviour

I didn't try (actually it's a user (@exiledinamarginalisland, isn't it?) who tried, and I don't know what error they met).

What do I really want

I just want to migrate my database, I don't care how we do that, as long as it works.

reference

https://groups.google.com/forum/#!topic/bauble/60MHqR5dQk0

mfrasca commented 6 years ago

let's say that:

all you need is to execute this non-recursive, non looping recipe:

finally, you need to update all sequences:

select setval('accession_id_seq', (select max(id) from accession));
select setval('accession_note_id_seq', (select max(id) from accession_note));
select setval('bauble_id_seq', (select max(id) from bauble));
select setval('collection_id_seq', (select max(id) from collection));
select setval('color_id_seq', (select max(id) from color));
select setval('default_vernacular_name_id_seq', (select max(id) from default_vernacular_name));
select setval('family_id_seq', (select max(id) from family));
select setval('family_note_id_seq', (select max(id) from family_note));
select setval('family_synonym_id_seq', (select max(id) from family_synonym));
select setval('genus_id_seq', (select max(id) from genus));
select setval('genus_note_id_seq', (select max(id) from genus_note));
select setval('genus_synonym_id_seq', (select max(id) from genus_synonym));
select setval('geography_id_seq', (select max(id) from geography));
select setval('habit_id_seq', (select max(id) from habit));
select setval('history_id_seq', (select max(id) from history));
select setval('location_id_seq', (select max(id) from location));
select setval('plant_change_id_seq', (select max(id) from plant_change));
select setval('plant_id_seq', (select max(id) from plant));
select setval('plant_note_id_seq', (select max(id) from plant_note));
select setval('plant_prop_id_seq', (select max(id) from plant_prop));
select setval('plant_status_id_seq', (select max(id) from plant_status));
select setval('plugin_id_seq', (select max(id) from plugin));
select setval('prop_cutting_id_seq', (select max(id) from prop_cutting));
select setval('prop_cutting_rooted_id_seq', (select max(id) from prop_cutting_rooted));
select setval('prop_seed_id_seq', (select max(id) from prop_seed));
select setval('propagation_id_seq', (select max(id) from propagation));
select setval('source_detail_id_seq', (select max(id) from source_detail));
select setval('source_id_seq', (select max(id) from source));
select setval('species_distribution_id_seq', (select max(id) from species_distribution));
select setval('species_id_seq', (select max(id) from species));
select setval('species_note_id_seq', (select max(id) from species_note));
select setval('species_synonym_id_seq', (select max(id) from species_synonym));
select setval('tag_id_seq', (select max(id) from tag));
select setval('tagged_obj_id_seq', (select max(id) from tagged_obj));
select setval('verification_id_seq', (select max(id) from verification));
select setval('vernacular_name_id_seq', (select max(id) from vernacular_name));
select setval('voucher_id_seq', (select max(id) from voucher));
RoDuth commented 6 years ago

@mfrasca just noticed this issue and thought I'd share my experiences.

I've done this many times for various reasons using the csv export/import functionality and highly recommend using it for this. For various reasons I've done sqlite > postgresql, postgresql > sqlite, even sqlite > sqlite (once to clean up corrupted data), postgresql > postgresql and have even used it in other contexts (e.g. to simplify the importing of data from other sources.). I've only ever had 2 real issues:

  1. a csv export may export blank tables which you may need to remove if they have dependencies on each other as they can stuff up the import process when importing into postgresql. From the location exported to, I had been using this rather ugly one liner to delete all files that don't have 2 or more lines. find . -type f | xargs -I {} bash -c 'if [ -z "$(sed '\''2q;d'\'' {})" ]; then rm {}; fi'. Far from ideal so I thought I'd have a quick look at what was causing it and this RoDuth/ghini.desktop@3ce0138 seems to fix it.
  2. The strictness differences between sqlite and postgres e.g.:
    • we had numbers that had been entered as "50+" or similar in the accession table. sqlite allowed it and postgres didn't. I just used sed -i '' -e 's/\([0-9]\)+/\1/' accession.txt
    • we had used long location names, sqlite allowed them, postgres enforced the limit in the location table definition (for this I just ended up doubling the allowed length, see: RoDuth/ghini.desktop@21f500bd)

Note also:

mfrasca commented 6 years ago

good day @RoDuth, thank you for sharing experience and solutions,

I'll have a look at your code. As I stated in the issue description, I never tested this myself, so I do not know what is the behaviour of the program in front of this situation.

about the numeric fields:

it is an error in SQLite, letting you write something like 50+ in a field you declare as numeric. it is an error in the ghini.desktop interface, letting you type that in a text entry widget associated to a numeric field. can you be more specific, as to which fields let you do that?

coming late to help you with bash, but to get the names of one-line-files, you can also do this: find . -type f -exec wc -l {} \; | sed -ne 's/^[ ]*1 //p'

RoDuth commented 6 years ago

can you be more specific, as to which fields let you do that?

using accession where quantity_recvd like '%+' i get 30 accession that were recorded that way. these are accessions where there is a clump or long hedge or the like so the logic makes sense but isn't entirely correct I agree. I have said to the data entry person that a 20 is as good as 20+ (or the like)

coming late to help you with bash

I think so I've had that one in my notes for something like 6 years! Or I would just do it manually using excel on a windows work computer. (no proper text editor, no bash, no python, no sqlite...)

mfrasca commented 6 years ago

unfortunately, you can enter '1+' in a 'acc_quantity_recvd_entry', and SQLite will let you write it in the corresponding field.  this is two mistakes in a row, first by us, then by the database engine.

if you try doing that with a postgresql database, the fault is only ours, because PG will quite rightfully tell you this:

(psycopg2.DataError) invalid input syntax for integer: "1+" LINE 1: UPDATE accession SET quantity_recvd='1+', _last_updated=now(...                                             ^ [SQL: 'UPDATE accession SET quantity_recvd=%(quantity_recvd)s, _last_updated=now() WHERE accession.id = %(accession_id)s'] [parameters: {'quantity_recvd': '1+', 'accession_id': 24}] (Background on this error at: http://sqlalche.me/e/9h9h)

entering the same '1+' in the 'plant_quantity_entry' is not allowed by our GUI, which is the correct thing to do.

if you want to open an issue, be my guest, or I will do so next week Monday.  we need to review all numeric entry fields I'm afraid.

ciao, M

mfrasca commented 6 years ago

importing a CSV produced in a different database should just work. if it does not, do not reopen this issue but open a very specific new one. we have addressed two problems in two different issues. #392 and (unreported, solved in 3ce0138, about empty files).