cproctor / qualitative-coding

Qualitative coding for computer scientists
Other
9 stars 3 forks source link

Add `corpus update` command #49

Closed cproctor closed 2 weeks ago

cproctor commented 1 month ago

Currently, corpus documents cannot be edited after they are imported--this prevents codes from getting out of sync with document line numbers, and is enforced by storing a document hash.

It would be nice to allow editing of documents, both to correct errors and to support additional features such as anonymization through named entity recognition. There are well-known algorithms for line-based diffs, which identify insertions, deletions, and changes. We could use these to update the line numbers for existing codes, and to delete codes for removed areas of files.

The API for corpus update might look like:

qc corpus update [paths+] [--recursive] [--dryrun]
cproctor commented 1 month ago

One issue: The database doesn't store full file contents, just a hash. Therefore, this version needs access to the old version and the new version. The old version's hash is checked against the database. I will support two workflows:

Therefore, the API becomes:

qc corpus update [path] [--new path] [--git] [--recursive] [--dryrun]

cproctor commented 2 weeks ago

Done in 1.4.0.