chrismattmann / tika-similarity

Tika-Similarity uses the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features.
Apache License 2.0
107 stars 59 forks source link

Added support for JSON file input through inputFile argument. #97

Closed matthewdavislee closed 1 year ago

matthewdavislee commented 4 years ago

Took the computeScores2 method from cosine_similarity.py (which takes in an input file of JSON objects) and extended those to the jaccard_similarity.py and edit-value-similarity.py. The edit-value one slightly different syntax than the others, so it was a bit trickier to implement.

e.g. Now can run: py -2 jaccard_similarity.py --fileInput json_to_input.json --outCSV test_output.csv

chrismattmann commented 1 year ago

code base has diverged too much from this at this point. Thanks @matthewdavislee all the same