CSAFE-ISU / handwriter

Forensic Handwriting Analysis Pipeline in R - Please download our stable version from CRAN using 'install.packages("handwriter")'
https://csafe-isu.github.io/handwriter/
GNU Affero General Public License v3.0
24 stars 9 forks source link

Error in `get_clusters_batch()` #186

Closed stephaniereinders closed 1 day ago

stephaniereinders commented 1 week ago

I processed all 1,604 CLV documents with process_batch_dir() and then ran get_clusters_batch() on the processed documents.

library(handwriter)

input_dir <- "/Users/stephanie/Documents/non_version_control/CVL/docs"
graphs_dir <- "/Users/stephanie/Documents/non_version_control/CVL/graphs"
clusters_dir <- "/Users/stephanie/Documents/non_version_control/CVL/clusters"
template <- readRDS(file.path(dirname(input_dir), "template.rds"))

handwriter::process_batch_dir(input_dir, graphs_dir)

handwriter::get_clusters_batch(template = template,
                               input_dir = graphs_dir,
                               output_dir = clusters_dir,
                               writer_indices = c(1, 4),
                               doc_indices = c(6, 6),
                               num_cores = 4,
                               save_master_file = TRUE)

get_clusters_batch() resulted in an error:

Error` in { : task 1022 failed - "subscript out of bounds"

stephaniereinders commented 1 day ago

Turns out that the error resulted from several blank documents in the data set. The blank documents don't have any graphs when processed so get_clusters_batch throws an error when trying to access the non-existent graphs. I added fixed processDocument to throw an error if a document doesn't contain any graphs, and get_clusters_batch now returns a warning message. These changes are in #190.