dell-research-harvard / AmericanStories

The official Github for the American Stories dataset as in {link}
107 stars 8 forks source link

Headline-article associations missing dependencies #2

Open mike-mcrae opened 1 year ago

mike-mcrae commented 1 year ago

Hello, Thank you very much for open sourcing your work. These tools are an invaluable contribution to this area of research.

While running your pipeline on your replication images and other images, it appears that the package does not appear to conduct the association of text from headlines to articles. The headline-article association scripts which were added to the repository have dependencies which are not included in the repository.

In ca_simple_rule_based.py, the following modules are associated with folders not included:

from data_fns import clusters_from_edges, edges_from_clusters, import_single_scan_labelled_data from data.clean_labelled_sample import get_prop_non_words, load_lowercase_spell_dict, clean

In generate_fa_and_ro_ids_rulebased.py, the following module is associated with folders not included: from scripts.quality_checking_functions import *

These appear to be user written functions specific to the methodology. It would be fantastic if these folders/files/functions could be added to complete the replication process.

Thank you.