atuinsh / atuin

✨ Magical shell history
https://atuin.sh
MIT License
18.54k stars 519 forks source link

Feature atuin import from file #2170

Open mijoharas opened 1 week ago

mijoharas commented 1 week ago

Hi, I wanted a one-off offline sync for my atuin history.db. I've left it in a rough and ready state to see if there's any interest in it, and to get feedback on if the feature is wanted.

(would address https://github.com/atuinsh/atuin/issues/816 ).

Reasons I'd personally like this feature:

Either way, figured it was easy enough to write some code to start a discussion. Let me know your thoughts.

Checks

ellie commented 1 week ago

Hey! Thanks for bringing this up

I think it's worth addressing what the full set of requirements are here. Are users OK with duplicate data if they run it twice? Do they expect sync to work after this has been done?

I'd rather not use the database as the transport mechanism here either - really something like atuin history dump should dump a format like jsonlines, and then the importer can read and import that. I'd be concerned that changes to the db schema could break imports in the future

mijoharas commented 1 week ago

Hey, schema changes and versioning is definitely one of the reasons I wanted to raise this Pr early for feedback! And thanks for all the good points:

Are users OK with duplicate data if they run it twice?

I should have mentioned (but I only tested it after I had raised the PR)., while we don't do anything in the code to remove duplicates, running it twice doesn't actually cause duplicates. I assume it's because we include the id of the record so the duplicate insert fails (I didn't dig into this.)

Do they expect sync to work after this has been done?

It's a good question, and probably something that I'd approach via documentation (sync is obviously the go-to ongoing sync feature, this new thing could be stressed as an option for if you have no internet, and we could note that it won't setup sync, and that setting up sync is probably what most users would want). But I've obviously got a lot less insight into your users than you do, so let me know your thoughts!

I'd rather not use the database as the transport mechanism here either - really something like atuin history dump should dump a format like jsonlines, and then the importer can read and import that.

This seems like a reasonable approach to me, and should also be fairly straightforward (I'm happy to update the Pr to do that if that's what we want to do). The final question is:

I'd be concerned that changes to the db schema could break imports in the future

This feels like it would be an issue whether we go for jsonlines or the history dump anyways, so something we should consider. Easiest way to solve is to ensure that both atuin's have the exact same schema version or something. I'm not super familiar with the data format, is there a schema number or something? if so, (and if we wanted to proceed with a json-lines version of this) how should it be enforced? would the json-lines file have the first line be something like {"atuin_history_dump_schema_version": "1"} and we bail out in the import process if dump-schema-version != current_schema_version? (and if so, where do we have the history schema version, is there something like that?)

Let me know your thoughts, and thanks for the response!