Closed cproctor closed 3 months ago
One issue: The database doesn't store full file contents, just a hash. Therefore, this version needs access to the old version and the new version. The old version's hash is checked against the database. I will support two workflows:
Therefore, the API becomes:
qc corpus update [path] [--new path] [--git] [--recursive] [--dryrun]
Done in 1.4.0.
Currently, corpus documents cannot be edited after they are imported--this prevents codes from getting out of sync with document line numbers, and is enforced by storing a document hash.
It would be nice to allow editing of documents, both to correct errors and to support additional features such as anonymization through named entity recognition. There are well-known algorithms for line-based diffs, which identify insertions, deletions, and changes. We could use these to update the line numbers for existing codes, and to delete codes for removed areas of files.
The API for
corpus update
might look like: