archivesunleashed / aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
https://aut.docs.archivesunleashed.org/
Apache License 2.0
137 stars 33 forks source link

Tweak hasDate to handle Seq. #430

Closed ruebot closed 4 years ago

ruebot commented 4 years ago

GitHub issue(s):

What does this Pull Request do?

Follow on to #429, Tweak hasDate to handle Seq.

How should this be tested?

...and if you want, something like:

import io.archivesunleashed._
import io.archivesunleashed.df._

val dates = Array("2008", "200908", "20070502")

RecordLoader.loadArchives("/home/nruest/Projects/aut/sample-data/geocities", sc)
  .all()
  .select($"url", $"crawl_date")
  .filter(!hasDate($"crawl_date", lit(dates)))
  .show(10, false)  
codecov[bot] commented 4 years ago

Codecov Report

Merging #430 into master will not change coverage by %. The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #430   +/-   ##
=======================================
  Coverage   77.70%   77.70%           
=======================================
  Files          41       41           
  Lines        1534     1534           
  Branches      283      282    -1     
=======================================
  Hits         1192     1192           
  Misses        217      217           
  Partials      125      125           
ruebot commented 4 years ago

Documentation PR https://github.com/archivesunleashed/aut-docs/pull/48