byrokrat / giroapp

Command line app for managing autogiro donations.
GNU General Public License v3.0
4 stars 2 forks source link

Collect checksums of imported files #36

Closed hanneskod closed 6 years ago

hanneskod commented 7 years ago

To prevent accidentaly importing the same file multiple times, wich could lead to data corruption.

Mabe a db collection of filenames and file hashes??

Can be implememted completely in a IMPORT_EVENT listener.

nonbinary commented 7 years ago

It would be good to have a db of past imports, to be able to backtrack errors or mistakes. And a checksum is a very good idea.

nonbinary commented 7 years ago

I'll start work on this soon. Not sure if I'd put the db in data/ or var/

hanneskod commented 7 years ago

Cool! And a valid question!

As I originaly imagined this I would put the data in data as a json collection with a schema like:

{
    "hash-of-file-contents": {
        "filename": "name-of-file",
        "date": "date-when-file-was-inported"
    }
}

(This means that the FileEvent needs to be augumented to contain the filename.)

But if I understand your first comment correctly you also want to save the actual content of imported files in the giroapp database. That I think should be a different thing. Listener\FileImportDumpingListener or something like that. And that should write to var as this is not data from the application..

nonbinary commented 7 years ago

I'm kind of undecided on saving the file contents. To be 100% sure, we should use the hash to detect double imports, but then double-check the contents of the file. Since hashes are reasonably sure, but not perfect, that's is the most secure way of doing it.

The file is also not that big, so it might be doable to save the files. Then again, it would still be reasonable to:

hanneskod commented 7 years ago

Hm. I'm not sure. When you start to get some donors the files that report performed transactions will be quite lengthy. Saving the file contents in a json structure is a bad idea in my opinion.

This is my proposal:

  1. Listener\FileChecksumListener that saves checksum and date in a json structure as suggested above. Fail if the checksum is in the database (possibly with the added condition that the filenames should match as you suggest).
  2. Add a --force (and -f) option that suppresses this error. (Are you sure you want to import this file, yes I am!)
  3. Add file dumping with a different listener as suggested above. Make using this listener optinal in the settings.
  4. Listener\FileContentListener that validates the actual file content by reading files from var as saved by FileDumpingListener. Using this listener should also be optional using a setting. To be used by the paranoid who does not trust hashes...
nonbinary commented 7 years ago

Agree, this looks good.