Center-for-Research-Libraries / crl-serials-validator

Validate bibliographic and holdings data for shared print.
GNU General Public License v3.0
0 stars 1 forks source link

Finalize the location of the Validator's data files. #30

Closed nflorin closed 2 years ago

nflorin commented 3 years ago

This isn't a technical question, but more a general one about how we want to treat the user's local machine.

TL;DR Where should we put the potentially large MARC database file?

  1. In a "hidden" folder that the user can access but probably will never see?
  2. In an obvious folder in the user's home directory?
  3. In the validator directory, in a place where we can't reuse the MARC database in other projects?

I don't want our files to be invasive and obnoxious, I also don't want them to be insidious and secretive. And I'd like to reuse them for future projects, but maybe that's just a pipe dream.

An expanded version of this in the second comment.

nflorin commented 3 years ago

The Validator saves the API key config file and the MARC database to a specific folder on the local machine. If this is a new install, at the moment the folders are the local equivalent of:

Not sure about MacOS.

These folders are chosen by the appdirs library, what it defines as the user data directory for an application called "CRL". They are accessible to the user, but are at least nominally "hidden". It might matter because the MARC database file can get somewhat huge, into the gigabytes if someone runs the API a lot. (Mine is over 8 GB, but that's a real outlier.) Most users don't spend time rooting around in AppData, and we don't want to pollute their hard drives with big files they might not ever use again and won't know to delete if they no longer need them.

The two other options would be to put everything in the data folder in the main program folder (this currently houses a list of JSTOR ISSNs and also the configuration file for the Validator proper) or in a CRL file that would do in the user's home directory, or possibly in their documents directory. So on Windows something like C:\Users\nflorin\CRL or C:\Users\nflorin\Documents\CRL.

We'd talked about keeping the data out of the validator folder with the idea that we might be making use of the API keys config and the MARC database for other tools we'd develop and release. (The internal "MARC Machine" tool uses the MARC database.) This would simplify that. Putting everything in the data folder would make installs more "portable", assuming that the user has all of the requisite dependencies installed -- just copy the whole crl-serials-validator folder to a new machine or even a flash drive and it's ready to use.

Another consideration is that if we ever convert this to a GUI and release a binary (exe) version then it's possible that the validator folder won't really exist. In this case all of the data, including what's currently in the data folder, would likely have to go to a standard machine folder. But this could be a special case, or we could release the binary in a zip file with the data, input, and output folders included.

nflorin commented 2 years ago

I made the executive decision to keep the data files in the user's home directory in a directory called CRL. I included a migration function to migrate installations away from the old appdirs locations (C:\Users\nflorin\AppData\Local\CRL\CRL) and the like. At some point in the future we can remove this function, on the assumption that all extant installations will have migrated already.