CSAFE-ISU / handwriter

Forensic Handwriting Analysis Pipeline in R - Please download our stable version from CRAN using 'install.packages("handwriter")'
https://csafe-isu.github.io/handwriter/
GNU Affero General Public License v3.0
24 stars 9 forks source link

Bugs in processHandwriting #93

Closed stephaniereinders closed 1 year ago

stephaniereinders commented 1 year ago

Bugs have been reported in processHandwriting(). The function apparently has trouble assigning nodes to some documents.

stephaniereinders commented 1 year ago

Example of error on a specific image...

image001

The following code produces an error on this image.

doc <- list()
doc$image <- readPNGBinary('image001.png')
doc$thin <- thinImage(doc$image)
doc$processed <- processHandwriting(img=doc$thin, dims=dim(doc$image))

Starting Processing... Getting Nodes...and merging them... Finding direct paths...and loops... Looking for letter break points...and discarding bad ones... Isolating letter paths... Organizing letters... Creating letter lists... Adding character features... Error in aggregate.data.frame(lhs, mf[-1L], FUN = FUN, ...) : no rows to aggregate

What I have found so far...

traceback() produces: 7: stop("no rows to aggregate") 6: aggregate.data.frame(lhs, mf[-1L], FUN = FUN, ...) 5: aggregate.formula(. ~ prediction, wordPredictions, mean) 4: aggregate(. ~ prediction, wordPredictions, mean) at ExtractFeatures.R#435 3: add_word_info(letterList, dims) at JunctionDetection.R#833 2: add_character_features(img, letterList, letters, dims) at JunctionDetection.R#692 1: processHandwriting(img = doc$thin, dims = dim(doc$image))

I figured out that add_word_info() adds the field "wordIndex" to letterList$characterFeatures. The function doesn't make any other changes to letterList.

wordIndex is accessed in the following functions:

Moving towards a solution...

I'm fairly confident that we could delete wordIndex, create_words(), process_words(), add_word_info(), make_single_word(), and plot_word(). In fact, this might be the best solution because all word separation tasks should now be performed with the word separation code in inst/python. However, Allie uses plotColorNode in her code that currently lives outside handwriter and I don't know whether her code uses wordIndex or any of the other functions that I just listed. My next step is to look at her code.

stephaniereinders commented 1 year ago

Allie sent me her code. I think the best way forward is to

  1. Copy all code implementing the old word separation process from handwriter to Allie's project.
  2. Delete all old word separation code from handwriter
  3. Test that processHandwriting works properly on handwriting examples, including the image of the word "The" that is currently failing because of the old word separation code.
  4. Change the code as necessary in Allie's project

Current Function Map of handwriter

I found a package on GitHub that creates a map of function call for a package. I ran the following commands:

source("https://raw.githubusercontent.com/MangoTheCat/remotes/master/install-github.R")$value("mangothecat/functionMap")
library(functionMap)
map_r_package('~/Documents/version_control/handwriter')

map_r_package('~/Documents/versioncontrol/handwriter') ───────────────────────────────────────────────────────────── Map of R package 'handwriter' ── ❯ utils::globalVariables ★ about_variable ❯ coda::as.mcmc, stringr::str_detect, stringr::str_replace add_character_features ❯ add_word_info, extract_character_features add_covariance_matrix ❯ i_to_rc, stats::cov, stats::var add_line_info ❯ all_centroids, all_down_dists, line_number_extract add_updown_neighboring_char_dist ❯ i_to_rci, rc_to_i add_word_info ❯ stats::aggregate, stats::predict ★ AddLetterImages ❯ AddSamplingStrata ❯ loop_extract addToFeatures ❯ .Call::_handwriter_addToFeatures all_centroids ❯ all_down_dists ❯ AllUniquePaths ❯ ???::delete.edges, ???::E, ???::shortest_paths, ???::shortest.paths ★ analyze_questioned_documents ❯ ???::%>%, ???::%dopar%, calculate_wc_likelihood, doParallel::registerDoParallel, dplyr::filter, foreach::foreach, format_draws, format_questioned_data, mc2d::dmultinomial, parallel::makeCluster,★ get_clusterassignment,★ process_batch_dir angle ❯ ★ calculate_accuracy ❯ calculate_wc_likelihood ❯ centeredImage ❯ centeredImageOnCentroid ❯ char_to_feature ❯ get_aspect_info, get_centroid_info character_features_by_line ❯ checkBreakPoints ❯ ???::delete.edges, ???::distances checkSimplicityBreaks ❯ ???::V, pathLetterAssociate checkStacking ❯ ???::V chooseCenters ❯ ???::%>%, ???::map2, dplyr::arrange, dplyr::group_by, dplyr::mutate, dplyr::select, dplyr::ungroup, letterToPrototype, tidyr::nest, tidyr::unnest chooseGraphs ❯ ???::%>%, dplyr::group_by, dplyr::slice_sample cleanBinaryImage ❯ .Call::_handwriter_cleanBinaryImage ★ copy_csafe_docs ❯ ★ count_csafe_correct_top_writer ❯ ???::%>%, dplyr::mutate, get_top_writer countChanges ❯ countNodes ❯ create_letter_lists ❯ getNodeOrder, pathLetterAssociate ★ create_words ❯ ★ crop ❯ davies_bouldin ❯ ★ getGraphDistance delete_crazy_graphs ❯ ???::%>%, dplyr::filter dist_loc ❯ distXY dist_sh ❯ Rfast::rowsums dist_sld ❯ distXY distXY ❯ do_setup ❯ futile.logger::appender.file, futile.logger::flog.appender, make_dir ★ drop_burnin ❯ coda::as.mcmc, coda::niter extract_character_features ❯ add_covariance_matrix, add_line_info, add_updown_neighboring_char_dist, char_to_feature, nov_neighboring_char_dist ★ extractGraphs ❯ ???::%dopar%, ???::foreach, doParallel::registerDoParallel, getGraphs find_colorpoints ❯ i_to_rc, i_to_rci findMergeNodes ❯ ???::E, ???::shortest_paths ★ fit_model ❯ format_model_data, rjags::coda.samples, rjags::jags.model,★ get_clusterassignment,★ process_batch_dir format_draws ❯ coda::as.mcmc format_model_data ❯ ???::%>%, dplyr::left_join, dplyr::rename, get_cluster_fill_counts format_questioned_data ❯ ???::%>%, ???::across, ???::where, dplyr::distinct, dplyr::left_join, dplyr::mutate, dplyr::rename, dplyr::select, get_cluster_fill_counts, tidyr::replace_na ★ format_template_data ❯ ???::%>%, dplyr::group_by, dplyr::mutate, dplyr::n, dplyr::summarize, tidyr::pivot_wider get_aspect_info ❯ i_to_rci get_centroid_info ❯ i_to_rci, rc_to_i get_cluster_fill_counts ❯ ???::%>%, dplyr::group_by, dplyr::mutate, dplyr::n, dplyr::summarise, tidyr::pivot_wider ★ get_clusterassignment ❯ ???::%>%, ???::%dopar%, angle, centeredImage, doParallel::registerDoParallel, dplyr::group_by, dplyr::slice_sample, foreach::foreach, makeassignment, MakeLetterListLetterSpecific, parallel::makeCluster, stringr::str_replace,★ AddLetterImages ★ get_credible_intervals ❯ get_credible_intervals_for_writer, get_pi_dataframes get_credible_intervals_for_writer ❯ stats::quantile get_loop_info ❯ get_pi_dataframes ❯ format_draws ★ get_posterior_probabilities ❯ get_prompt_code ❯ get_strata ❯ ???::%>%, dplyr::group_by, dplyr::n, dplyr::summarize get_top_writer ❯ ★ get_posterior_probabilities getAllPairsDistances ❯ dist_loc, dist_sh, dist_sld ★ getGraphDistance ❯ getAllPairsDistances, getGraphInfo, solveLP getGraphInfo ❯ pathToRC, Rfast::eachrow getGraphs ❯ ★ processHandwriting,★ readPNGBinary,★ thinImage GetImageMatrix ❯ ???::%>%, ???::image_quantize, ???::image_read, ???::image_resize, centeredImage, grDevices::as.raster,★ thinImage getLoops ❯ ???::degree, ???::delete.edges, ???::delete.vertices, ???::dfs, ???::distances, ???::E, ???::graph_from_adjacency_matrix, ???::induced_subgraph, ???::intersection, ???::neighborhood, ???::shortest_paths, ???::V, stats::na.omit getNodeGraph ❯ ???::add_vertices, ???::add.edges, ???::make_empty_graph getNodeOrder ❯ getNodes ❯ countChanges, whichNeighbors i_hs_to_i_letter ❯ i_hs_to_rc_letter i_hs_to_rc_hs ❯ i_hs_to_rc_letter ❯ i_hs_to_rc_hs i_to_c ❯ i_to_r ❯ i_to_rc ❯ i_to_rci ❯ i_to_x ❯ i_to_y ❯ letterKmeansWithOutlier_parallel ❯ ???::%dopar%, davies_bouldin, doParallel::registerDoParallel, foreach::foreach, meanGraphSet_slowchange, root_mean_square_error, utils::tail, variance_ratio_criterion, within_cluster_sum_of_squares,★ getGraphDistance letterPaths ❯ ???::delete_vertices, ???::distances, ???::E, ???::V letterToPrototype ❯ PathToRC line_number_extract ❯ i_to_rci, stats::median, utils::head loop_extract ❯ ★ make_clustering_templates ❯ chooseCenters, chooseGraphs, delete_crazy_graphs, do_setup, get_strata, letterKmeansWithOutlier_parallel, make_images_list, make_proc_list,★ process_batch_dir make_dir ❯ make_images_list ❯ centeredImage make_proc_list ❯ AddSamplingStrata, MakeLetterListLetterSpecific,★ AddLetterImages ★ make_single_word ❯ makeassignment ❯ ★ getGraphDistance MakeCenterStarts ❯ AddSamplingStrata, letterToPrototype, MakeSamplingDF MakeLetterListLetterSpecific ❯ i_hs_to_i_letter, i_hs_to_rc_hs, i_hs_to_rc_letter, rc_hs_to_rc_letter makeModel ❯ randomForest::randomForest, rjson::fromJSON, stats::aggregate, stats::predict, usethis::use_data MakeSamplingDF ❯ ???::%>%, ???::arrange, ???::group_by, ???::mutate, ???::nest, ???::select, ???::ungroup, ???::unnest, purrr::map2 meanGraphSet_slowchange ❯ ???::%dopar%, doParallel::registerDoParallel, foreach::foreach, handwriter::getGraphDistance, letterToPrototype, stats::quantile, weightedMeanGraphs nov_neighboring_char_dist ❯ character_features_by_line organize_letters ❯ ???::V, stats::na.omit otsuBinarization ❯ graphics::hist overall_meanGraph ❯ stats::quantile, weightedMeanGraphs,★ getGraphDistance pathLetterAssociate ❯ pathToRC ❯ PathToRC ❯ ★ plot_cluster_fill_counts ❯ ???::%>%, ???::aes, ???::facet_wrap, ???::geom_line, ???::geom_point, ???::scale_x_continuous, ???::theme_bw, dplyr::mutate, ggplot2::ggplot, tidyr::pivot_longer ★ plot_cluster_fill_rates ❯ ???::%>%, ???::aes, ???::facet_wrap, ???::geom_line, ???::geom_point, ???::scale_x_continuous, ???::theme_bw, dplyr::mutate, ggplot2::ggplot, tidyr::pivot_longer ★ plot_credible_intervals ❯ ???::%>%, ???::aes, ???::element_text, ???::facet_wrap, ???::geom_errorbar, ???::geom_line, ???::ggplot, ???::labs, ???::starts_with, ???::sym, ???::theme, ???::theme_bw, stringr::str_replace_all, tidyr::pivot_longer, tidyr::pivot_wider,★ get_credible_intervals ★ plot_posterior_probabilities ❯ ???::%>%, ???::aes, ???::element_text, ???::geom_bar, ???::geom_tile, ???::ggplot, ???::labs, ???::scale_fill_gradient2, ???::theme, ???::theme_bw, ???::xlab, ???::ylab, ggplot2::ggplot, tidyr::pivot_longer ★ plot_trace ❯ ???::%>%, ???::aes, ???::geom_line, ???::labs, ???::theme_bw, format_draws, ggplot2::ggplot,★ about_variable ★ plotColorNodes ❯ ???::aes, ???::geom_point, ???::str_replace,★ plotImage ★ plotImage ❯ ???::aes, ???::coord_fixed, ???::geom_raster, ???::ggplot, ???::scale_fill_manual, ???::theme_void, reshape2::melt ★ plotImageThinned ❯ ???::aes, ???::coord_fixed, ???::geom_raster, ???::ggplot, ???::scale_alpha_manual, ???::scale_fill_manual, ???::theme_void, reshape2::melt ★ plotLetter ❯ ???::aes, ???::geom_line, ???::geom_point, ???::geom_text,★ plotImageThinned,★ plotNodes ★ plotLine ❯ ★ plotImage ★ plotNodes ❯ ???::aes, ???::geom_point,★ plotImageThinned plotNodesLine ❯ ???::aes, ???::geom_point, ???::geom_segment,★ plotImageThinned plotNodesLine1 ❯ ???::aes, ???::geom_curve, ???::geom_point,★ plotImageThinned ★ plotWord ❯ ★ plotImage pointLineProportionVect ❯ ★ process_batch_dir ❯ tools::file_path_sans_ext,★ read_and_process ★ process_batch_list ❯ tools::file_path_sans_ext,★ read_and_process ★ process_words ❯ find_colorpoints ★ processHandwriting ❯ ???::as_data_frame, ???::delete.vertices, ???::distances, ???::E, ???::graph_from_data_frame, ???::V, add_character_features, AllUniquePaths, checkBreakPoints, checkSimplicityBreaks, checkStacking, create_letter_lists, findMergeNodes, getLoops, getNodeGraph, getNodes, igraph::simplify, letterPaths, organize_letters, reshape2::melt, stats::na.omit, whichNeighbors, whichNeighbors0 rc_hs_to_rc_letter ❯ rc_to_i ❯ ★ read_and_process ❯ ★ processHandwriting,★ readPNGBinary,★ thinImage ★ readPNGBinary ❯ ???::image_read, ???::image_write, cleanBinaryImage, otsuBinarization, png::readPNG, rgba2rgb,★ rgb2grayscale ★ rgb2grayscale ❯ .Call::_handwriter_rgb2grayscale rgba2rgb ❯ .Call::_handwriter_rgba2rgb root_mean_square_error ❯ ★ runHandwritingViewer ❯ shiny::runApp ★ SaveAllLetterPlots ❯ grDevices::as.raster, magick::image_read, magick::image_transparent, magick::image_write,★ AddLetterImages ★ select_csafe_docs ❯ ★ select_model_docs,★ select_questioned_docs,★ select_template_docs ★ select_model_docs ❯ ???::%>%, dplyr::filter, get_prompt_code ★ select_questioned_docs ❯ ???::%>%, dplyr::filter, get_prompt_code ★ select_template_docs ❯ ???::%>%, dplyr::filter, get_prompt_code solveLP ❯ lpSolve::lp ★ thinImage ❯ .Call::_handwriter_thinImage variance_ratio_criterion ❯ overall_meanGraph weightedMeanGraphs ❯ getAllPairsDistances, getGraphInfo, pointLineProportionVect, solveLP whichNeighbors ❯ whichNeighbors0 ❯ ★ whichToFill ❯ .Call::_handwriter_whichToFill within_cluster_sum_of_squares ❯

[!NOTE] Functions denoted by ★ are exported.